학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Controllable Prosody Generation With Partial Inputs

Resource Type: Working Paper
Authors: Iliescu, Dan Andrei; Mohan, Devang Savita Ram; Teh, Tian Huey; Hodari, Zack
Source
Subject: Electrical Engineering and Systems Science - Audio and Speech Processing
Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Machine Learning
Language

Online Access

초록

We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model generates the missing features. We propose a model that is specifically designed to encode partial prosodic features and output complete audio. We show empirically that our model displays two essential qualities of a human-in-the-loop control mechanism: efficiency and robustness. With even a very small number of input values (~4), our model enables users to improve the quality of the output significantly in terms of listener preference (4:1).
Comment: 5 pages

공지

DAU Library

학술논문

요약정보

Controllable Prosody Generation With Partial Inputs

Online Access

초록