학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Research on Speech Synthesis Based on Prosodic Controllability

Resource Type: Conference
Authors: Gao, Wenyu; Hamdulla, Askar; Yang, Xipeng
Source: 2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA) Human-Computer Interaction, Optimization and Robotic Applications (HORA), 2023 5th International Congress on. :1-5 Jun, 2023
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Training
Shape
Pitch control (audio)
Neural networks
Rhythm
Robot sensing systems
Acoustics
End-to-End speech synthesis
Pitch controllable
Interpolation algorithm
VoteF0
Feature pre-processin
Language

Online Access

Full Text (IEEE)

초록

In today's highly interactive human-computer world, speech synthesis is widely used in many scenarios, and the requirements for rhyme effects in speech synthesis technology are increasing, so rhyme-controllable models have become a research hotspot. Most current rhyme controllable models are achieved by using a separate neural network to generate reference features, but this approach requires the training of more complex neural network models and the availability of reference audio to achieve display rhyme control. This paper proposes a rhyme-controllable solution based on an end-to-end acoustic model to address the problem of models being unable to precisely control the tone at the word level. The proposed model includes a tone-controllable module, which obtains duration information through the MFA alignment tool and adjusts the tone of the words by using word-level pitch control values and duration information. The acoustic model in this paper is improved by introducing pitch control during the generation of acoustic features and generating more robust audio by combining it with the decoder. In addition, to adjust the overall tone of the audio, a fixed coefficient is multiplied to the pitch values of all frames. Furthermore, this paper also proposes a 48KHz ultra-high-definition audio model by increasing the spectral parameter dimensions and upsampling by a factor.

공지

DAU Library

학술논문

요약정보

Research on Speech Synthesis Based on Prosodic Controllability

Online Access

초록