A Comparative Study on End-to-End Speech Synthetic Units for Amdo-Tibetan Dialect
- Resource Type
- Conference
- Authors
- Li, Qian; Dan, Zhengjia; Zhao, Yue; Huang, Xin; Zhang, Xubei; Yang, Li
- Source
- 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI) Pattern Recognition and Artificial Intelligence (PRAI), 2022 5th International Conference on. :678-682 Aug, 2022
- Subject
- Computing and Processing
Signal Processing and Analysis
Vocoders
Buildings
Training data
Phonetics
Acoustics
Data models
Pattern recognition
speech synthesis unit
Amdo-Tibetan dialect
end-to-end model
Latin transliteration
- Language
This paper focuses on the speech synthesis modeling for Amdo-Tibetan dialect. We analyze the linguistic and phonetic characteristics of Amdo-Tibetan and present to use Latin letters as the synthetic unit for building the end-to-end speech synthesis model. The comparative experiments are designed and carried out to evaluate three type of synthetic unit – Latin letters, Tibetan initials and finals, and Tibetan syllables. The experimental results show that the model using the Latin letters transcribed from Tibetan characters by Wiley transliteration has better performance than the ones using the other two synthetic units. Also, this paper compares the different acoustic features and different vocoders for Amdo-Tibetan speech synthesis. The experimental results show that the speech synthesized using mel-spectrogram as acoustic feature and WaveNet as vocoder has the better clarity and naturalness.