eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

Zero-Shot Singing Voice Synthesis from Musical Score

Resource Type: Conference
Authors: Wang, Jun-You; Lee, Hung-Yi; Jang, Jyh-Shing Roger; Su, Li
Source: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Automatic Speech Recognition and Understanding Workshop (ASRU), 2023 IEEE. :1-8 Dec, 2023
Subject: Signal Processing and Analysis
Training
Conferences
Linguistics
Feature extraction
Acoustics
Data models
Encoding
Singing voice synthesis
zero-shot
semi-weakly-supervised learning
Language

Online Access

Full Text (IEEE)

초록

Zero-shot singing voice synthesis (SVS), the task to synthesize the singing voice of an arbitrary target singer, has gained increasing attentions in the past few years. Several recently proposed systems have demonstrated promising results on this task. However, these systems require detailed musical features at the frame level as the musical content. To deal with this issue, we propose a model that performs zero-shot SVS with only musical score as the musical content condition. To help model training, we build an acoustic encoder that extracts linguistic features from audio, and train it with the lyrics transcription objective. The output of the acoustic encoder serves as an alternative to the musical score, allowing the SVS model to learn from weakly labeled data. Results suggest that the proposed method outperforms baseline semi-supervised method in both subjective and objective tests.

공지

DAU Library

eArticles

요약정보

Zero-Shot Singing Voice Synthesis from Musical Score

Online Access

초록