In recent years, automatic crowd counting using density estimation has received significant attention in the field of computer vision due to its importance in congestion control, public safety and ecological surveys. However, crowd counting tasks also often face some challenges such as scene changes, scale variations and occlusions. In this paper, we propose a crowd density estimation network that incorporates multi-layer feature maps to address the problems of diverse crowd scale changes and background interference in crowded scenes. The proposed multi-column feature extraction module fuses fine-grained feature information with coarse-grained feature information to better learn the head features of characters and enhance the accuracy of the network. Combined with the Curriculum-learning training strategy, it improves the learning ability of the model, accelerates the model convergence, and further improves the model training and fusion. The effectiveness of the method is demonstrated by testing on multiple datasets.