학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

GCNTrain: A Unified and Efficient Accelerator for Graph Convolutional Neural Network Training

Resource Type: Conference
Authors: Lu, Heng; Song, Zhuoran; Li, Xing; Jing, Naifeng; Liang, Xiaoyao
Source: 2022 IEEE 40th International Conference on Computer Design (ICCD) ICCD Computer Design (ICCD), 2022 IEEE 40th International Conference on. :730-737 Oct, 2022
Subject: Components, Circuits, Devices and Systems
Computing and Processing
Training
Deep learning
Computational modeling
Neural networks
Graphics processing units
Computer architecture
Feature extraction
Graph Convolutional Network
Sparse Matrix Multiplication
Computer Architecture
Language
ISSN: 2576-6996

Online Access

Full Text (IEEE)

초록

Graph convolutional neural networks (GCNs) have been emerging as a promising category of neural network models for extending deep learning to graph data analytics. Serving as a type of semi-supervised models, GCNs need training before being used to extract any input graph’s features. The challenge is that the existing GCN accelerators often target the sparse-dense matrix multiplications (SpDM) in GCN inference while ignoring the compute-intensive GCN training. Obviously, this poses momentous performance demands and design challenges.In this paper, we categorize the computations of GCN training into sparse-sparse matrix multiplications (SpSpM) and sparse-dense matrix multiplications (SpDM); and then introduce the GCNTrain-v1 architecture that uniformly performs both SpSpM and SpDM by the column-wise-product-based method. To ad-dress the bank conflict problem in the GCNTrain-v1 architecture, we further propose the GCNTrain-v2 architecture with the conflict-free bank access strategy. This strategy is able to coalesce all requests to one bank by broadcasting elements. Moreover, to alleviate the workload imbalance problem in the GCNTrain-v2 architecture, we offer the GCNTrain-v3 architecture with the offline reshuffle technique that offline reshuffles and balances the non-zero elements in the matrix before GCN training. Overall, the GCNTrain-v3 architecture implements both SpSpM and SpDM for accelerating GCN training without bank conflict and work-load imbalance problems. On five graph datasets, experiment results demonstrate considerable performance speedups over CPU (80.47×), GPU (10.88×), and GCNAX (1.65×).

공지

DAU Library

학술논문

요약정보

GCNTrain: A Unified and Efficient Accelerator for Graph Convolutional Neural Network Training

Online Access

초록