An Efficient Piecewise Linear Approximation of Non-linear Operations for Transformer Inference
- Resource Type
- Conference
- Authors
- Lu, Haodong; Mei, Qichang; Wang, Kun
- Source
- 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) FCCM Field-Programmable Custom Computing Machines (FCCM), 2023 IEEE 31st Annual International Symposium on. :206-206 May, 2023
- Subject
- Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Performance evaluation
Costs
Computational modeling
Piecewise linear approximation
Transformers
Table lookup
Task analysis
n/a
- Language
- ISSN
- 2576-2621
Transformer-based models have achieved remarkable performance across various tasks, while the computational complexity presents an obstacle for deploying on resource-constrained devices. To this end, this paper proposes an efficient approximation framework termed NPLA for approximating non-linear operations during Transformer inference on hardware accelerators. Specifically, NPLA enables the approximation of non-linear operations using non-uniform piecewise linear functions and directly converts coefficients into LUTs for hardware implementation. Experimental results demonstrate that NPLA can reduce the hardware cost by 13.43× in LUTs and 1.98× in DSP compared to the state-of-the-art method.