Fast and high-precision image mosaic is the key technology of UAV aerial photogrammetry and remote sensing. However, due to the large amount of video image sequence data and high redundancy, the processing speed of video image stitching task is slow and the stitching accuracy is low. From the perspective of reducing the redundancy of video image sequence, this paper proposes a two-stage aerial video key frame extraction technology based on the fusion of the number of expected matching point pairs and UAV navigation information. Firstly, the algorithm uses high overlap rate to roughly extract the key frames of dense sequences, and then dynamically adjusts the number of inter frame matching point pairs and overlap rate to extract the fine key frames through optimization strategy. The test results on the open source aerial photographing data set created by Bu et al. show that this method can eliminate more than 92.3% redundant frames on average, and the processing speed of 1080P format aerial video sequence on general computer also reaches 31.4 frames per second. Compared with the key frame extraction method based only on navigation information, this method not only overcomes the problem of insufficient matching point pairs in the overlapping region, but also outputs matching point pair information which can be directly used in the subsequent image registration process, further improving the stitching speed.