Semantic segmentation of urban scenes is widely recognized as a fundamental and crucial task that is of great significance to the urban planning and urban management. When applying the semantic segmentation models to large-scale or unseen scenarios, a so-called "domain shift" generally exists between the source and the target domain, where we train and evaluate our models. To mitigate such a problem, domain adaptation leverages the unlabeled data in the target domain to align or fine tuned the model using the target statistics. In previous domain adaptation approaches based on self-training, only the output-level knowledge (e.g., pseudo labels) from the source model is used to guide the adaptation process. However, we notice that the "pseudo features" generated by the source model could be of greater potential for capturing the relationship between target domain samples. To this end, we seek to exploit feature-level relation among neighboring pixels in the target domain to structure the prediction of the domain adaptation model. Experiments on public ISPRS datasets verify that the proposed method can outperform previous unsupervised domain adaptation methods, and demonstrate the benefits of mining local feature relation.