Remote sensing image captioning has been widely applied to traffic management, geographic research, etc. Although the neural network approach has been successfully improving the performance of the Remote sensing image captioning system, Remote sensing image captioning is still facing object identification challenges due to the small size of objects, uneven distribution, and high coupling with the surrounding image background. In this paper, we propose a novel remote sensing image captioning encoder-decoder model Hierarchical rearrangement-Multi-Layer Perceptron (HMLP) whose encoder adapts hierarchical rearrangement-multi-layer perceptron to improve the capability of objection recognition. Extensive experiments have been conducted to testify HMLP by three datasets RSCID, UCM-caption, and NWPU-caption. Results show that HMLP outperforms many image captioning systems in the evaluation metrics BLEU4, METEOR, ROUGE-L, and CIDEr.