With the development and improvement of artificial intelligence technology and multi-source information integration theory, emotional recognition has gradually transitioned from single-modality emotional recognition to multimodal. Due to the shortage of an emotional cognitive channel, the hearing-impaired subjects may have an emotional cognitive deviation in daily communication. Therefore, we classified the four kinds of emotions (happiness, fear, calmness, sadness) of the hearing-impaired subjects based on the feature-level fusion network which combined electroencephalography (EEG) signals and facial expressions. In this network, we adopted different feature extraction methods to obtain shallow features related to emotional changes in the two modalities, and then used multi -head cross-attention mechanism to the feature-level fusion layer between the two modalities. The results show that the average classification accuracy of four emotions recognition after multimodal fusion achieves 92.09%, which is higher than EEG signals (15.40%) and facial expressions (11.49%).