Sparsity is becoming arguably the most critical dimension to explore for efficiency and scalability as deep learning models grow significantly larger. Particularly, pruning is a common method to reduce redundant computations in attention-based and convolution-based models. The induced sparse matrix multiplication (SpMM) normally requires domain-specific hardware architecture (DSA) to eliminate unnecessary zero-valued computations. However, generating an optimal kernel code for SpMM on general-purpose and ISA-based spatial accelerators without changing the hardware architecture is still an open problem.In this paper, we propose a compiler plug-in named SpMMPlu, which can extend the representation and optimization ability for SpMM in current deep learning compiler frameworks that only support dense matrix multiplication. The key of SpMMPlu is a flexible intermediate representation— Sparse IR, representing the SpMM with various sparsity patterns based on meta-ops with a multi-level structure. Meta-op takes abstraction of the hardware intrinsic as its minimum granularity, and the powerful optimizers of existing NN compiler backends (e.g., Auto-schedule in TVM, AKG in MindSpore) can be easily reused for its computational scheduling and code generation. Moreover, we propose a two-step (segmentation & grouping) method to achieve an efficient Sparse IR for each sparsity pattern. Only three passes are added in SpMMPlu to provide an automatic solution for SpMM kernel code generation. We embed SpMMPlu into MindSpore and do experiments on NVIDIA V100 GPU and Huawei Ascend 910 to verify its effectiveness and scalability. The results show that with SpMMPlu, MindSpore can support various sparsity patterns and deliver a 1.93× (on V100 GPU) and 2.21× (on AScend 910) speedup averagely compared to the dense counterpart.