Medical applications heavily rely on medical image segmentation for tasks such as disease detection, treatment planning, and surgical navigation. While CNNs excel at local feature extraction, they have limitations in capturing global features. Transformer-based models are effective in global feature extraction but struggle with local features. TransUnet combines CNN and Transformer but lacks sufficient global and local feature extraction due to its serial connection approach. TransFuse addresses this by designing parallel Transformer and CNN branches for global and local feature extraction. The self-attention of ViT, the core network of the Transformer branch, has a computational complexity that is directly correlated with the square of the image size, which means it cannot meet the need for multi-level information in complicated segmentation tasks. To overcome these limitations, we propose SR-Unet, a novel dual-branch medical image segmentation network. It merges Transformer and CNN branches in the encoder for optimal global and local feature extraction, utilizing the Swin Transformer as the Transformer branch backbone. Furthermore, we integrate CBAM(Convolutional Block Attention Module) in the decoder’s fusion block to fully merge global and local features and eliminate redundant and irrelevant information, thereby enhancing segmentation accuracy. Experimental results on the Synapse, CXML, and BUSI datasets demonstrate that SR-Unet outperforms TransFuse. It achieves higher DSC metrics by 3.47%, 0.16%, and 0.35%, and significantly better HD metrics by 41.21%, 6.98%, and 11.33%.