학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Canonical Voice Conversion and Dual-Channel Processing for Improved Voice Privacy of Speech Recognition Data

Resource Type: Conference
Authors: Sharma, Dushyant; Nespoli, Francesco; Gong, Rong; Naylor, Patrick A.
Source: 2023 31st European Signal Processing Conference (EUSIPCO) European Signal Processing Conference (EUSIPCO), 2023 31st. :66-70 Sep, 2023
Subject: Signal Processing and Analysis
Training
Data privacy
Privacy
Europe
Signal processing
Data augmentation
Data models
Language
ISSN: 2076-1465

Online Access

Full Text (IEEE)

초록

This paper addresses the need for enhancing the privacy of test data in a deployed automatic speech recognition (ASR) system so that what was said cannot be linked to who said it, a process we describe as acoustic de-identification. Existing techniques can be used to modify voice characteristics to make the speaker identity unrecognizable, but normally at the expense of ASR performance. We present a novel approach for improving ASR performance on acoustically de-identified voice data. Our method exploits a dual-channel input to a self-attention channel combinator front-end to an end-to-end ASR system, and data augmentation, where some amount of original speech data is used in model training. The voice data is de-identified by a zero-shot voice style transfer system to the voice of a registered, canonical speaker. We show that the proposed approach achieves a significant improvement in privacy as demonstrated by a 10x increase in the EER of an automatic speaker verification system, while also improving the ASR accuracy as demonstrated by a 18.3% reduction in WER relative to a single channel model baseline model when tested on acoustically de-identified speech.

공지

DAU Library

학술논문

요약정보

Canonical Voice Conversion and Dual-Channel Processing for Improved Voice Privacy of Speech Recognition Data

Online Access

초록