The rising popularity of employing deep learning networks for image denoising can be observed over the past decade. Typically, their exceptional performance is rooted in their ability to learn the mapping from noisy images to clear ones through extensive training on image datasets. However, variations in noise types and intensity between test and training images significantly impact their performance. Hence, due to its weak generalization capability in practical applications, it becomes imperative to train multiple denoising models for different types of noise and varying degrees of noise interference. To address this challenge, we introduce an effective masked Transformer network by incorporating the random mask module into the network. Specifically, a random mask module is integrated into the Transformer to randomly discard certain features, thereby augmenting the network’s generalization capabilities. Furthermore, the random mask module is also applied during the input processing stage to reduce reliance on the whole original image information. Additionally, the sampling operations are added to the network. Based on this, the image size is reduced to half of the original size through downsampling layer, significantly improving the execution efficiency. Experimental results manifest that the proposed masked Transformer network surpasses state-of-the-art methods like SwinIR and Restormer concerning denoising effectiveness and execution efficiency across both synthetic and real-world noisy images.