With the development of convolutional neural networks (CNNs), deeper and wider networks improve accuracy while increasing deployment difficulty. Design lightweight algorithms and high-efficiency hardware accelerators has therefore become a research hotspot. In this article, the sparsity and quantized weights of CNNs are explored in the quest to obtain lightweight models that decrease the number of weights to 28% and bit widths to 25% compared with original models. To avoid unbalanced computation, we model sparse CNNs and propose a data flow with high processing element (PE)-utilization ratio and low DRAM-access amount. We design a sparse CNN accelerator based on shift units, in which the zero weights are skipped to save execution time and energy. The layout was taped out in TSMC 28 nm and packaged with QFP144. Passing the functional tests, the design achieves 256.1GOPS performance and gains 1.133TOPS/W efficiency. Compared with similar designs, our design achieves 46.7% less EDP.