학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores

Resource Type: Conference
Authors: Li, Binrui; Cheng, Shenggan; Lin, James
Source: 2021 IEEE International Conference on Cluster Computing (CLUSTER) CLUSTER Cluster Computing (CLUSTER), 2021 IEEE International Conference on. :1-11 Sep, 2021
Subject: Computing and Processing
Tensors
Fast Fourier transforms
Memory management
Graphics processing units
Market research
Libraries
Artificial intelligence
Mixed-precision
FFT
GPU
Tensor Cores
Language
ISSN: 2168-9253

Online Access

Full Text (IEEE)

초록

Mixed-precision computing becomes an inevitable trend for HPC and AI applications due to the increasing using mixed-precision units such as NVIDIA Tensor Cores. Fast Fourier transform (FFT) is one of the most widely-used scientific kernels and hence mixed-precision FFT is highly demanded. However, few existing FFT libraries (or algorithms) can support universal size of FFTs on Tensor Cores. Therefore, we proposed tcFFT, a fast half-precision FFT library on Tensor Cores that can support universal size of 1D and 2D FFTs. Our work consists of two parts: framework design and performance optimizations. We designed the tcFFT library framework to support all power-of-two size and multi-dimension of FFTs; we applied two performance optimizations, one to use Tensor Cores efficiently and the other to ease GPU memory bottlenecks. We evaluated tcFFT with a wide range size of 1D and 2D FFTs on NVIDIA V100 and A100 GPUs. The results show that tcFFT can outperform 1.29X-3.24X and 1.10X-3.03X higher on average than NVIDIA cuFFT v11.0 in FP16 on V100 and A100, respectively.

공지

DAU Library

학술논문

요약정보

tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores

Online Access

초록