학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Evaluating a New Attention Framework Based on Matrix Blocking for Attention Models on FPGAs

Resource Type: Conference
Authors: Liu, Xiaohang; Jiang, Jingfei; Xu, Jinwei; Gao, Lei
Source: 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) ICTAI Tools with Artificial Intelligence (ICTAI), 2022 IEEE 34th International Conference on. :607-615 Oct, 2022
Subject: Bioengineering
Computing and Processing
Robotics and Control Systems
Performance evaluation
Graphics processing units
Accelerator architectures
Systolic arrays
Natural language processing
Mobile handsets
Space exploration
attention mechanism
accelerator architecture
softmax
matrix blocking
systolic array
Language
ISSN: 2375-0197

Online Access

Full Text (IEEE)

초록

The attention mechanism has recently shown superior performance in natural language processing and computer vision tasks. But its complex dataflow and large-scale matrix calculation with huge computing and memory overhead pose a great challenge for the design of hardware accelerators. And previous solutions that benefited from matrix partitioning are bounded by the softmax function. In this paper, we propose a new attention framework that can dramatically improve the performance of attention model inference for long sequence tasks on FPGAs. We design a novel accelerator architecture that employs two systolic arrays and a ping-pong structure to accelerate attention calculation. Meanwhile, we propose an analytical model to predict resource usage and performance, which guides a fast design space exploration. Experiments using the state-of-the-art BERT demonstrate the design achieves 4.61 and 1.24× improvement in speed and energy efficiency compared to CPU and GPU on the Xilinx XCZU11EG platform.

공지

DAU Library

학술논문

요약정보

Evaluating a New Attention Framework Based on Matrix Blocking for Attention Models on FPGAs

Online Access

초록