학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A 28nm 11.2TOPS/W Hardware-Utilization-Aware Neural-Network Accelerator with Dynamic Dataflow

Resource Type: Conference
Authors: Du, Cheng-Yan; Tsai, Chieh-Fu; Chen, Wen-Ching; Lin, Liang-Yi; Chang, Nian-Shyang; Lin, Chun-Pin; Chen, Chi-Shi; Yang, Chia-Hsiang
Source: 2023 IEEE International Solid-State Circuits Conference (ISSCC) Solid-State Circuits Conference (ISSCC), 2023 IEEE International. :1-3 Feb, 2023
Subject: Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
Deep learning
Convolution
Shape
Neural networks
Parallel processing
Benchmark testing
Energy efficiency
Language
ISSN: 2376-8606

Online Access

Full Text (IEEE)

초록

With the rapid evolution of AI technology, various neural network structures have been developed for diverse applications. As a typical ease, Fig. 22.4.1 shows that the convolution (Conv) layer used in the convolutional neural networks (CNNs) features distinct shapes and types. Neural network accelerators with high peak energy efficiency have been demonstrated [1–4]. However, they usually suffer decreased hardware (mainly multiply-accumulate (MAC) units) utilization for various network structures, which reduces the attainable energy efficiency accordingly. To improve the MAC utilization, the Nvidia deep learning accelerator (NVDLA) [5] applies hardware parallelism along the channel direction, but the MAC utilization is still low for the shallow layers. According to our experiments, NVDLA achieves 23% MAC utilization in the worst case. A Scatter-Gather scheme [4] is utilized to mitigate the utilization drop for shallow layers by rearranging the input features (IF), but the improvement is limited. As depthwise convolution (Dwcv) has been widely used, the accompanying low MAC utilization also needs to be considered. Taking MobileNetV2 as an example, NVDLA only achieves 0.4% utilization for Dwcv. To address these critical issues, this work presents a utilization-aware neural network accelerator, which can dynamically change the level of parallelism along multiple dimensions to maximize the MAC utilization. The chip achieves $> 97.3{\%}$ MAC utilization on benchmark networks while delivering $4.7\times$ higher attainable energy efficiency than state-of-the-art designs [1–4].

공지

DAU Library

학술논문

요약정보

A 28nm 11.2TOPS/W Hardware-Utilization-Aware Neural-Network Accelerator with Dynamic Dataflow

Online Access

초록