학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Exploring Bit-Level Sparsity for Partial Sum Quantization in Computing-In-Memory Accelerator

Resource Type: Conference
Authors: Bai, Jinyu; Sun, Sifan; Kang, Wang
Source: 2023 IEEE 12th Non-Volatile Memory Systems and Applications Symposium (NVMSA) NVMSA Non-Volatile Memory Systems and Applications Symposium (NVMSA), 2023 IEEE 12th. :32-37 Aug, 2023
Subject: Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
Power, Energy and Industry Applications
Quantization (signal)
Image resolution
Costs
Nonvolatile memory
Neural networks
Energy resolution
Hardware
Computing-In-Memory (CIM)
partial sum quantization (PSQ)
bit-level sparsity
post-training quantization (PTQ)
Language
ISSN: 2575-257X

Online Access

Full Text (IEEE)

초록

Computing-In-Memory (CIM) has demonstrated great potential in boosting the performance and energy efficiency of convolutional neural networks. However, due to the limited size and precision of its memory array, the input and weight matrices of convolution operations have to be split into sub-matrices or even binary sub-matrices, especially when using bit-slicing and single-level cells (SLCs). A large number of partial sums are generated as a result. To maintain high computing precision, high-resolution analog-to-digital converters (ADCs) are used to obtain partial sums at the cost of considerable area and substantial energy overhead. Partial sum quantization (PSQ), a technique that can greatly reduce the resolution of ADC, remains sparsely studied. This paper proposes a novel PSQ approach for CIM - based accelerators by exploring the bit-level sparsity of neural networks. Then, to find the optimal clipping threshold for ADCs, a reparametrized clipping function is also proposed. Finally, we develop a general post-training quantization framework for the PSQ-CIM. Experiments on a variety of neural networks and datasets show that, in typical case (ResNet18 for ImageNet), the required resolution of ADC can be reduced to 2 bits with little accuracy loss (~.92 %) and the hardware efficiency can be improved by 199.7%.

공지

DAU Library

학술논문

요약정보

Exploring Bit-Level Sparsity for Partial Sum Quantization in Computing-In-Memory Accelerator

Online Access

초록