Semi-supervised Multimodal Emotion Recognition with Improved Wasserstein GANs
- Resource Type
- Conference
- Authors
- Liang, Jingjun; Chen, Shizhe; Jin, Qin
- Source
- 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019 Asia-Pacific Signal and. :695-703 Nov, 2019
- Subject
- Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Emotion recognition
Acoustics
Semisupervised learning
Generators
Visualization
Gallium nitride
Generative adversarial networks
- Language
- ISSN
- 2640-0103
Automatic emotion recognition has faced the challenge of lacking large-scale human labeled dataset for model learning due to the expensive data annotation cost and inevitable label ambiguity. To tackle such challenge, previous works have explored to transfer emotion label from one modality to the other modality assuming that the supervised annotation does exist in one modality or explored semi-supervised learning strategies to take advantage of large amount of unlabeled data with the focus on a single modality. In this work, we address the multimodal emotion recognition problem with the acoustic and visual modalities and propose a multi-modal network structure of the semi-supervised learning approach with an improved generative adversarial network CT-GAN. Extensive experiments conducted on a multi-modal emotion recognition corpus demonstrate the effectiveness of the proposed approach and prove that utilizing unlabeled data via GANs and combining multi-modalities both benefit the classification performance. We also carry out some detailed analysis experiments such as influence of unlabeled data quantity on recognition performance and impact of different normalization strategies for semi-supervised learning etc.