As one of the basic tasks of computer vision, semantic segmentation is widely used in many fields, e.g., medical images parsing, scene parsing, autonomous driving, etc. In the current mainstream approaches, downsampling or patching operation is required to ensure that the GPU memory is not overloaded for dealing with the high-resolution input images. However, the corresponding cost is the lack of details in the final segmentation map. In this work, we proposed RFNet, a refinement network which resolves the lack of detailed information in coarse predictions by fusing the coarse predictions and the fine predictions gained by fine input image patches. There are three key characteristics: (i) designing a spatial information extraction module which can efficiently process information in coarse and fine feature maps at spatial level. (ii) proposing an auxiliary-fusion information branch calculated from the prediction maps, which contribute to refine predictions. (iii) designing a boundary auxiliary loss function in the training process, which makes the model pay more attention to those pixels belonging to the boundary of objects. We show the superiority of the proposed RFNet on the Cityscapes dataset, the experimental results illustrate that ours RFNet performance outperforms other state-of-the-art approaches with low computation consumption. The codes will be available at: https://github.com/zhu-gl-ux/RFNet.