In recent years, the methods of combining global and local features have shown great potential in image retrieval task, the most recent and impressive work focus on feature fusion of two features. In this paper, based on the method mentioned above, we proposed a new feature fusion module, named Position Transformer, which takes advantage of the self-attention mechanism and uses its relation matrix of the final layer as relation scores between the features, and thus conducts an effective and efficient feature split. The precise feature split helps later feature fusion which handles the problem of information loss caused by roughly joining features from multiple spatial dimensions. Moreover, we introduce model fusion and contrastive loss to help train a robust model, ablation experiment has proved the rationality of these designs.