Remote sensing image semantic segmentation of land use, benefitted from the development of deep learning and consequently made considerable progress in terms of inferencing accuracy and speed. However, the effective training of semantic segmentation models for remote sensing imagery necessitates extensively detailed pixel-level annotations, and gathering such data is both time-intensive and laborious. Thus, this study implemented low-rank adaptation on a stable diffusion algorithm to learn the distribution of the pixel-level annotations in case of the LoveDA dataset. Consequently, the annotation-image pairs were used to train the remote sensing image generator based on stable diffusion guided by ControlNet. We proposed a stable diffusion based approach, which can generate image-annotation pairs from scratch. The generated annotation and image pairs achieved a high accuracy of 0.520 mean intersection-over-union on LoveDA dataset, which is close to the original data training result of 0.539 mIoU. Furthermore, the mixed training using generated and original data achieved 0.542 mIoU, thereby demonstrating the data augmentation function of our approach. This study provided a solution for the high-cost pixel-level annotation issue, and thus, exhibited the potential of artificial intelligence generated content.