Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis
- Resource Type
- Conference
- Authors
- Li, Runnan; Wu, Zhiyong; Liu, Xunying; Meng, Helen; Cai, Lianhong
- Source
- 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. :5510-5514 Mar, 2017
- Subject
- Signal Processing and Analysis
Speech
Hidden Markov models
Acoustics
Predictive models
Recurrent neural networks
Speech synthesis
Trajectory
text-to-speech
acoustic model
multi-task learning
structured output layer
deep bidirectional long short-term memory
- Language
- ISSN
- 2379-190X
Recurrent neural networks (RNNs) and their bidirectional long short term memory (BLSTM) variants are powerful sequence modelling approaches. Their inherently strong ability in capturing long range temporal dependencies allow BLSTM-RNN speech synthesis systems to produce higher quality and smoother speech trajectories than conventional deep neural networks (DNNs). In this paper, we improve the conventional BLSTM-RNN based approach by introducing a multi-task learned structured output layer where spectral parameter targets are conditioned upon pitch parameters prediction. Both objective and subjective experimental results demonstrated the effectiveness of the proposed technique.