The safety and reliability of the pantograph are critical and essential maintenance tasks in the railway transportation system. The majority of previous efforts proposed intelligent detection methods for achieving rapid and accurate inspection of the pantograph's health status. However, no research has been conducted on the automatic generation of pantograph health status reports, which is the primary reference basis for maintenance decisions. In this paper, in the light of the successful work of DenseCap, a pantograph image captioning model (PanCap for short) is proposed, which replaces VGG-16 with ResNet-50-FPN as the backbone to extract richer image features. In addition, Focal Loss and Transformer are used in PanCap to improve the description performance by addressing the problems of classification imbalance and dependent description. Evaluate the Visual Genome (VG) and pantograph image dataset, and the effectiveness of the proposed method is demonstrated by the experimental results.