Aircraft recognition aims to identify an aircraft type from its external appearance, serving as a vital task in the military field. The development of photography equipment allows technicians to collect images of the aircraft with rich information over a variety of scales and resolutions in a convenient way. However, these images are taken from different light and viewing angles, leading to the aircraft's various shapes, radiations, and colors. Such variance raises challenges for automated aircraft recognition techniques. This paper proposes an accurate and robust automated aircraft recognition technique based on Vision Transformers (ViT) to resist the variation carried by visual images. In particular, the self-attention mechanism in the ViT can better models the long-range dependency of pixels, compared with the existing convolutional neural network (CNN) approaches. We evaluate the effectiveness of our approach on a publicly available benchmark FGVCAircraft over multi-level granularity categories. The suggested ViT model achieves an overall Precision@1 of 0.915, outperforming other baselines, especially in images with complex variations.