학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A Novel end-to-end Speech Emotion Recognition Network with Stacked Transformer Layers

Resource Type: Conference
Authors: Wang, Xianfeng; Wang, Min; Qi, Wenbo; Su, Wanqi; Wang, Xiangqian; Zhou, Huan
Source: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021 - 2021 IEEE International Conference on. :6289-6293 Jun, 2021
Subject: Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Emotion recognition
Art
Conferences
Pipelines
Speech recognition
Signal processing
Feature extraction
Speech emotion recognition
end-to-end
stacked transformer layers
Language
ISSN: 2379-190X

Online Access

Full Text (IEEE)

초록

Speech emotion recognition (SER) aims to automatically recognize emotional category for a given speech utterance. The performance of a SER system heavily relies on the effectiveness of global representation expressed at utterance level. To effectively extract such a global feature, the mainstream of recent SER architectures adopts a pipeline with two key modules, feature extraction and aggregation. Although variant module designs have brought impressive progresses, SER is still a challenging task. In contrast with those previous works, herein we propose a novel strategy for global SER feature extraction by applying an additional enhancement module on top of the current SER pipeline. To verify its effect, an end-to-end SER architecture is proposed where stacked multiple transformer layers are explored to enhance the aggregated global feature. Such an architecture is evaluated on IEMO-CAP and results strongly substantiate the effectiveness of our proposal. In terms of weighted accuracy on four emotion categories, our proposed SER system outperforms the prior arts by a large margin of relatively 20% improvement. Our codes and the pre-trained SER models are made publicly available.

공지

DAU Library

학술논문

요약정보

A Novel end-to-end Speech Emotion Recognition Network with Stacked Transformer Layers

Online Access

초록