Evaluating a New Attention Framework Based on Matrix Blocking for Attention Models on FPGAs
- Resource Type
- Conference
- Authors
- Liu, Xiaohang; Jiang, Jingfei; Xu, Jinwei; Gao, Lei
- Source
- 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) ICTAI Tools with Artificial Intelligence (ICTAI), 2022 IEEE 34th International Conference on. :607-615 Oct, 2022
- Subject
- Bioengineering
Computing and Processing
Robotics and Control Systems
Performance evaluation
Graphics processing units
Accelerator architectures
Systolic arrays
Natural language processing
Mobile handsets
Space exploration
attention mechanism
accelerator architecture
softmax
matrix blocking
systolic array
- Language
- ISSN
- 2375-0197
The attention mechanism has recently shown superior performance in natural language processing and computer vision tasks. But its complex dataflow and large-scale matrix calculation with huge computing and memory overhead pose a great challenge for the design of hardware accelerators. And previous solutions that benefited from matrix partitioning are bounded by the softmax function. In this paper, we propose a new attention framework that can dramatically improve the performance of attention model inference for long sequence tasks on FPGAs. We design a novel accelerator architecture that employs two systolic arrays and a ping-pong structure to accelerate attention calculation. Meanwhile, we propose an analytical model to predict resource usage and performance, which guides a fast design space exploration. Experiments using the state-of-the-art BERT demonstrate the design achieves 4.61 and 1.24× improvement in speed and energy efficiency compared to CPU and GPU on the Xilinx XCZU11EG platform.