A 0.44-μJ/dec, 39.9-μs/dec, Recurrent Attention In-Memory Processor for Keyword Spotting
- Resource Type
- Periodical
- Authors
- Dbouk, H.; Gonugondla, S.K.; Sakr, C.; Shanbhag, N.R.
- Source
- IEEE Journal of Solid-State Circuits IEEE J. Solid-State Circuits Solid-State Circuits, IEEE Journal of. 56(7):2234-2244 Jul, 2021
- Subject
- Components, Circuits, Devices and Systems
Engineered Materials, Dielectrics and Plasmas
Computing and Processing
Integrated circuits
Computer architecture
Random access memory
Complexity theory
Computational modeling
Task analysis
Classification algorithms
In-memory computing (IMC)
keyword spotting (KWS)
machine learning
recurrent attention networks
- Language
- ISSN
- 0018-9200
1558-173X
This article presents a deep learning-based classifier IC for keyword spotting (KWS) in 65-nm CMOS designed using an algorithm-hardware co-design approach. First, a recurrent attention model (RAM) algorithm for the KWS task (the KeyRAM algorithm) is proposed. The KeyRAM algorithm enables accuracy versus energy scalability via a confidence-based computation (CC) scheme, leading to a $2.5\times $ reduction in computational complexity compared to state-of-the-art (SOTA) neural networks, and is well-suited for in-memory computing (IMC) since the bulk (89%) of its computations are 4-b matrix-vector multiplies. The KeyRAM IC comprises a multi-bit multi-bank IMC architecture with a digital co-processor. A sparsity-aware summation scheme is proposed to alleviate the challenge faced by IMCs when summing sparse activations. The digital co-processor employs diagonal major weight storage to compute without any stalls. This combination of the IMC and digital processors enables a balanced tradeoff between energy efficiency and high accuracy computation. The resultant KWS IC achieves SOTA decision latency of 39.9 $\mu \text{s}$ with a decision energy < 0.5 $\mu \text{J}$ /dec which translates to more than 24 $\times $ savings in the energy-delay product (EDP) of decisions over existing KWS ICs.