To mitigate the ever-increasing privacy concerns of model inference, intensive efforts have been put to develop cryptograph-based private deep learning inference, that preserves the confidentiality of the submitted query and its inference result. However, privacy is not free but expensive as the secure computation over the ciphertext domain is time-consuming for both linear and non-linear layers, especially the homomorphic operations. To boost efficiency, a novel optimization is proposed for the evaluation of homomorphic convolutions, which is the most computation-intensive component throughout the entire inference processing. In specific, our approach involves the following critical designs. First, the Winograd fast convolution algorithm is applied to minimize the number of multiplications in convolutions. Second, we fuse this algorithm with the SIMD-enabled additive homomorphic encryption to expedite homomorphic convolution evaluation. Third, the sparsity of the model parameters is explored to further compress the computational cost brought by homomorphic encryption. In addition, it is non-trivial to extend the original Winograd algorithm to accommodate convolution operations, when kernels are larger than 3 × 3 and strides are greater than 1. We conquer this technical challenge and enable the applicability of the proposed optimizations for general convolution parameter configurations. In terms of performance, our scheme outperforms state-of-the-art secure inference methods, demonstrating a 2× reduction in the number of multiplications for convolution evaluation and a 30% improvement in end-to-end latency.