Semantic segmentation is a hot research issue in the field of image processing. The introduction of depth images improves the effect of semantic segmentation. However, most existing methods do not take into account the differences between RGB and depth features, leading to poor segmentation accuracy. To fully utilize the RGB and depth features, an asymmetric two-branch convolutional neural network structure is proposed in this paper. In the depth feature extraction branch, a feature enhancement module is proposed to reduce noise. Meanwhile, in the branch of RGB feature extraction, a skip connection structure is introduced to extract more abundant RGB features. In addition, a fusion module based on attention mechanism is proposed to make full use of the effective information from the two modals. Finally, extensive experiments are conducted, and the results show that the proposed model can complete the semantic segmentation task for indoor scenes efficiently.