학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis

Resource Type: Conference
Authors: Wang, Disong; Deng, Liqun; Zhang, Yang; Zheng, Nianzu; Yeung, Yu Ting; Chen, Xiao; Liu, Xunying; Meng, Helen
Source: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021 - 2021 IEEE International Conference on. :5714-5718 Jun, 2021
Subject: Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Performance evaluation
Quantization (signal)
Computational modeling
Conferences
Controllability
Real-time systems
Mobile handsets
Text-to-speech
controllable and efficient
semi-autoregressive
prosody modelling
knowledge distillation
Language
ISSN: 2379-190X

Online Access

Full Text (IEEE)

초록

Sequence-to-sequence (seq2seq) learning has greatly improved text-to-speech (TTS) synthesis performance, but effective implementation on resource-restricted devices remains challenging as seq2seq models are usually computationally expensive and memory intensive. To achieve fast inference speed and small model size while maintain high-quality speech, we propose FCL-taco2, a Fast, Controllable and Lightweight (FCL) TTS model based on Tacotron2. FCL-taco2 adopts a novel semi-autoregressive (SAR) mode for phoneme level based parallel mel-spectrograms generation conditioned on prosody features, leading to faster inference speed and higher prosody controllability than Tacotron2. Besides, knowledge distillation (KD) is leveraged to compress a relatively large FCL-taco2 model to its small version with minor loss of speech quality. Experimental results on English (EN) and Chinese (CN) datasets show that the small version of FCL-taco2 achieves comparable performance with Tacotron2 in terms of speech quality, while it has a 4.8× smaller footprint with 17.7× and 18.5× faster inference speeds on average for EN and CN experiments respectively. Besides, execution on mobile devices shows that the proposed model can achieve faster than real-time speech synthesis. Our code and audio samples are released 1 .

공지

DAU Library

학술논문

요약정보

Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis

Online Access

초록