학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

SpMMPlu: A Compiler Plug-in with Sparse IR for Efficient Sparse Matrix Multiplication

Resource Type: Conference
Authors: Yang, Tao; Zhou, Yiyuan; Tang, Qidong; Xu, Feng; Ma, Hui; Zhao, Jieru; Jiang, Li
Source: 2023 60th ACM/IEEE Design Automation Conference (DAC) Design Automation Conference (DAC), 2023 60th ACM/IEEE. :1-6 Jul, 2023
Subject: Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
Deep learning
Processor scheduling
Scalability
Computational modeling
Graphics processing units
Computer architecture
Parallel processing
Sparsity
Plug-in
DNN compiler
Intermediate representation
CNN
Transformer
Language

Online Access

Full Text (IEEE)

초록

Sparsity is becoming arguably the most critical dimension to explore for efficiency and scalability as deep learning models grow significantly larger. Particularly, pruning is a common method to reduce redundant computations in attention-based and convolution-based models. The induced sparse matrix multiplication (SpMM) normally requires domain-specific hardware architecture (DSA) to eliminate unnecessary zero-valued computations. However, generating an optimal kernel code for SpMM on general-purpose and ISA-based spatial accelerators without changing the hardware architecture is still an open problem.In this paper, we propose a compiler plug-in named SpMMPlu, which can extend the representation and optimization ability for SpMM in current deep learning compiler frameworks that only support dense matrix multiplication. The key of SpMMPlu is a flexible intermediate representation— Sparse IR, representing the SpMM with various sparsity patterns based on meta-ops with a multi-level structure. Meta-op takes abstraction of the hardware intrinsic as its minimum granularity, and the powerful optimizers of existing NN compiler backends (e.g., Auto-schedule in TVM, AKG in MindSpore) can be easily reused for its computational scheduling and code generation. Moreover, we propose a two-step (segmentation & grouping) method to achieve an efficient Sparse IR for each sparsity pattern. Only three passes are added in SpMMPlu to provide an automatic solution for SpMM kernel code generation. We embed SpMMPlu into MindSpore and do experiments on NVIDIA V100 GPU and Huawei Ascend 910 to verify its effectiveness and scalability. The results show that with SpMMPlu, MindSpore can support various sparsity patterns and deliver a 1.93× (on V100 GPU) and 2.21× (on AScend 910) speedup averagely compared to the dense counterpart.

공지

DAU Library

학술논문

요약정보

SpMMPlu: A Compiler Plug-in with Sparse IR for Efficient Sparse Matrix Multiplication

Online Access

초록