학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Voice Conversion from Tibetan Amdo Dialect to Tibetan U-tsang Dialect Based on Generative Adversarial Networks

Resource Type: Conference
Authors: Zhenye, Gan; Guangying, Zhao; Hongwu, Yang; Xiaotian, Xing; Yi, Jiao
Source: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) Information Technology and Artificial Intelligence Conference (ITAIC), 2019 IEEE 8th Joint International. :325-329 May, 2019
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Data models
Training
Acoustics
Generators
Gallium nitride
Analytical models
Feature extraction
Generative Adversarial Networks
Voice Conversion
over-smoothing
Deep Neural Networks
Language

Online Access

Full Text (IEEE)

초록

This paper proposes a Voice Conversion (VC) method from Tibetan Amdo dialect to Tibetan U-tsang dialect based on Generative Adversarial Networks (GANs). An inevitable problem with the traditional VC framework is that the acoustic feature vector output from the conversion model is over-smoothing, which leads to a drop in the quality of the converted speech. This is because in the training phase of acoustic model, a specific probability model is used to model the distribution of data, so that the output of a relatively average parameter of the model is considered to be optimal. Acoustic parameter over-smoothing occurs as long as the analytical form of the model distribution is artificially designed. In order to overcome this problem, the VC framework proposed in this paper uses GANs as the modeling network of the acoustic model, directly uses a generator model to learn the distribution of data, and guides the generator through a discriminator model. The training of the model makes the sample distribution of the model close to the distribution of the target speaker data samples, thus alleviating the problem of over-smoothing of the converted speech spectrum. The experimental results show that the proposed method is superior to VC based on Deep Neural Networks (DNNs) in the sound quality and similarity of the converted speech.

공지

DAU Library

학술논문

요약정보

Voice Conversion from Tibetan Amdo Dialect to Tibetan U-tsang Dialect Based on Generative Adversarial Networks

Online Access

초록