Graph convolutional neural networks (GCNs) have been emerging as a promising category of neural network models for extending deep learning to graph data analytics. Serving as a type of semi-supervised models, GCNs need training before being used to extract any input graph’s features. The challenge is that the existing GCN accelerators often target the sparse-dense matrix multiplications (SpDM) in GCN inference while ignoring the compute-intensive GCN training. Obviously, this poses momentous performance demands and design challenges.In this paper, we categorize the computations of GCN training into sparse-sparse matrix multiplications (SpSpM) and sparse-dense matrix multiplications (SpDM); and then introduce the GCNTrain-v1 architecture that uniformly performs both SpSpM and SpDM by the column-wise-product-based method. To ad-dress the bank conflict problem in the GCNTrain-v1 architecture, we further propose the GCNTrain-v2 architecture with the conflict-free bank access strategy. This strategy is able to coalesce all requests to one bank by broadcasting elements. Moreover, to alleviate the workload imbalance problem in the GCNTrain-v2 architecture, we offer the GCNTrain-v3 architecture with the offline reshuffle technique that offline reshuffles and balances the non-zero elements in the matrix before GCN training. Overall, the GCNTrain-v3 architecture implements both SpSpM and SpDM for accelerating GCN training without bank conflict and work-load imbalance problems. On five graph datasets, experiment results demonstrate considerable performance speedups over CPU (80.47×), GPU (10.88×), and GCNAX (1.65×).