eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

Efficient CNN Hardware Architecture Based on Linear Approximation and Computation Reuse Technique

Resource Type: Conference
Authors: Tolba, Mohammed F.; Saleh, Hani; Al-Qutayri, Mahmoud; Hroub, Ayman; Stouraitis, Thanos
Source: 2023 International Conference on Microelectronics (ICM) Microelectronics (ICM), 2023 International Conference on. :7-10 Dec, 2023
Subject: Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Costs
Computational modeling
Linear approximation
Computer architecture
Logic gates
Approximation algorithms
Convolutional neural networks
Deep neural network
Hardware acceleration
computational reuse
approximate computing
AI accelerator
Language
ISSN: 2159-1679

Online Access

Full Text (IEEE)

초록

Large deep neural network (DNN) models pose significant computational and memory challenges, particularly when deploying them on edge devices. To address this, techniques such as pruning, quantization, data sparsity, and data reuse have been applied to DNNs, mitigating memory and computational complexity at the cost of some accuracy loss. This paper introduces an efficient hardware accelerator tailored for Convolutional Neural Networks (CNNs). The proposed architecture is the result of a co-optimized approach encompassing both algorithms and hardware. It leverages linear approximation of pre-trained network weights with minimal accuracy loss. A novel computational reuse method is presented to curtail the number of multiplication and addition operations and memory accesses, seamlessly integrated into the dedicated elements within the CNN design. To validate the effectiveness of this architecture, we conducted experiments on a gem5-based RISCV simulator, employing the VGG16 model for the CIFAR 100 dataset and the AlexNet model for the TinyImageNet dataset. The results showcased an impressive speedup of approximately $2\times$ on AlexNet compared to the reference model. Additionally, our proposed CNN design was successfully implemented on the Xilinx Kintex 7 Field Programmable Gate Array (FPGA), achieving a notable reduction in hardware resource utilization compared to prior research efforts. This work serves as a versatile framework for evaluating diverse trade-offs involving accuracy, latency, power consumption, and cost across different CNN architectures.

공지

DAU Library

eArticles

요약정보

Efficient CNN Hardware Architecture Based on Linear Approximation and Computation Reuse Technique

Online Access

초록