Remote sensing image scene classification, which aims to identify the types of land cover, is a fundamental task in remote sensing image analysis. Remote sensing images contain a variety of land-cover objects. These land-cover objects form a complex and diverse scene through spatial combination and correlation, which makes remote sensing imagery scenes classification difficult. In addition, remote sensing images contain redundant information that has a negative impact on remote sensing imagery scene classification, which makes remote sensing imagery scenes classification rather challenging. Recently, there are many deep learning based methods, which have achieved remarkable performance through an end-to-end supervised training process. Existing advances in remote sensing imagery scene classification mainly focus on training multi-layer convolutional neural networks (CNNs). These CNNs do not explicitly distinguish between key information and redundant information of the image. Therefore, the ability to extract features is limited. How to focus on key information and ignore redundant information in remote sensing imagery scene classification is a valuable problem. Inspired by the attention mechanism, we propose a CNN-based network that combines residual units and attention mechanism. It automatically assigns large weights to key areas of the image and thus has the ability to adaptively ignore redundant information. We evaluated the proposed approach with some state-of-the-art methods on the UC Merced Land-Use dataset and the NWPU-RESISC45 dataset. Experimental results show that the proposed attention model has achieved the best classification performance.