학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A Multilingual Framework Based on Pre-training Model for Speech Emotion Recognition

Resource Type: Conference
Authors: Zhang, Zhaohang; Zhang, Xiaohui; Guo, Min; Zhang, Wei-Qiang; Li, Ke; Huang, Yukai
Source: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021 Asia-Pacific. :750-755 Dec, 2021
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Training
Human computer interaction
Emotion recognition
Speech recognition
Manuals
Information processing
Feature extraction
Language
ISSN: 2640-0103

Online Access

Full Text (IEEE)

초록

Speech emotion recognition (SER) attracts much attention in recent years, especially under multilingual circum-stances because of its potential in understanding human psy-chology and developing human-computer interaction. However, recent works in SER task mainly focus on developing fantastic structures to improve performance on monolingual datasets. Little attention is paid to promote the transfer performance on multilingual datasets. In this paper, we propose a multilingual SER framework that utilizes the pre-training model as an upstream to learn high-level speech representations and develop a hierarchical grained and feature model (HGFM) as a classifier. The proposed framework extracts speech representations based on a cross-lingual speech representations (XLSR) model and utilizes the HGFM structure to finish the classification task. We validate our framework on a multilingual dataset including IEMOCAP (English), EmoDB (German), TESS (English), SAVEE (English), EMA (English), and EMOVO (Italian). Experimental results show that features extracted by upstream model achieve an average weighted accuracy (WA) of 70.6% and unweighted ac-curacy (UA) of 73.4 % in the downstream task, which outperforms not only manual features but other upstream structures. We also compare our results with the state-of-the-art and alternative methods to validate our framework and evaluate the performance of the structure in terms of F1-score.

공지

DAU Library

학술논문

요약정보

A Multilingual Framework Based on Pre-training Model for Speech Emotion Recognition

Online Access

초록