In response to the subjectivity and high workload of traditional cultural image feature recognition methods, a deep learning model, YOLOv5s-GCT (YOLOv5s with Ghost, Coordinate, and Transformer Modules), is proposed in this paper for recognizing facial features in shadow puppet images. The model is based on YOLOv5s and is designed to reduce both computational and model complexity by incorporating the Ghost structure. The features of the model are assigned weights using the attention module, CA (Coordinate Attention), and the Transformer module is utilized to enhance the model's global information acquisition ability. The model parameters are reduced to 2.6MB, which is one-fifth of the size of YOLOv5s, and the mAP is improved to 91.6%. The proposed approach shows effective results in achieving fast and accurate recognition with reduced computational complexity.