This paper presents a novel algorithm named Keyframe Attention Network (KAN) for video captioning, which combines keyframe feature extraction with an attention allocation mechanism. The proposed method first utilizes a threshold-based keyframe extraction technique to obtain keyframes. Subsequently, keyframe representation module is employed to extract essential features from these keyframes, this module is built by deep residual network. Finally, the extracted feature vectors, along with reference captions, are fed into an attention allocation module to generate descriptive captions. The inclusion of deep residual network ensures an increased network depth without encountering gradient explosions. Moreover, the attention module adopts an Encoder-Decoder structure with additional attention layers, enabling effective attention allocation and yielding more accurate captions.