Medical image translation across different modalities plays a crucial role in clinical diagnosis, but it often encounters challenges due to the variations in imaging characteristics. In this paper, we present an “Improved Residual Vision Transformer (ResViT)” for CT to MRI image translation. Our contributions encompass three vital aspects. Firstly, we introduce a novel CNN-based encoder-decoder architecture within ResViT, explicitly designed to extract Lymph Node level features, ensuring their preservation during the translation process. Secondly, we propose the innovative “Lymph Node Aware Loss” function, which leverages attention-based multi-scale feature extraction to faithfully transfer Lymph Node-specific features from real MRIs to synthetic MRIs. Finally, we conduct an extensive literature review and in-depth discussions about the optimal loss terms in the generator of ResViT, with a specific focus on Lymph Node preservation. Through extensive experiments on a dataset comprising abdominal Contrast-Enhanced CT and T2-Weighted Axial MRI scans of 47 patients, we validate the effectiveness of our proposed ResViT model. The mean PSNR, SSIM, and MSE metrics exhibit substantial improvements over the existing ResViT model, showcasing the superiority of the modified ResViT in both the pre-training and fine-tuning stages. Our synthetic MRIs offer valuable contributions to data augmentation, MRI registration with respect to CT, and Lymph Node annotation tasks, underscoring their biomedical significance.