Face super-resolution reconstruction is a critical task in various application domains. To fully leverage facial pose and texture information, we propose a novel algorithm for face super-resolution reconstruction that incorporates multi-scale spatial transformation and attention mechanisms. Firstly, the multi-scale spatial transformation is employed to learn the facial pose and perform affine transformations, enhancing the local information of the face image. Additionally, spatial attention is utilized to extract features in the image space, capturing global characteristics. Subsequently, a residual learning approach and a multi-path connected up-sampling module are employed to progressively increase the image resolution, reducing feature loss and training instability during the up-sampling process. Finally, a multi-scale convolution is utilized to reconstruct high-resolution facial images. The performance of the proposed algorithm is evaluated using peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) as objective evaluation metrics. Experimental evaluations, conducted on the CelebA dataset at magnifications of 4x and 8x, demonstrate that our model achieves improved reconstruction of edge and texture information, as well as significant enhancements in the objective evaluation indicators of PSNR and SSIM.