RGB-T tracking utilizes thermal infrared images as a complement to visible light images in order to perform more robust visual tracking in various scenarios. However, the highly aligned RGB-T image pairs introduces redundant information, the modal quality fluctuation during tracking also brings unreliable information. Existing RGB-T trackers usually use channel-wise multi-modal feature fusion in which the low-quality features degrades the fused features and causes trackers to drift. In this work, we propose a region selective fusion network that first evaluates each image region by cross-modal and cross-region modeling, then removes low-quality redundant region features to alleviate the negative effects caused by unreliable information in multi-modal fusion. Besides, the region removal scheme brings a efficiency boost as redundant features are removed progressively, this enables the tracker to run at a high tracking speed. Extensive experiments show that the proposed tracker achieves competitive performance with a real-time tracking speed on multiple RGB-T tracking benchmarks including LasHeR, RGBT234 and GTOT.