eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

Contrastive Self-Supervised Speaker Embedding With Sequential Disentanglement

Resource Type: Periodical
Authors: Tu, Y.; Mak, M.; Chien, J.
Source: IEEE/ACM Transactions on Audio, Speech, and Language Processing IEEE/ACM Trans. Audio Speech Lang. Process. Audio, Speech, and Language Processing, IEEE/ACM Transactions on. 32:2704-2715 2024
Subject: Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
General Topics for Engineers
Training
Speech processing
Linguistics
Labeling
Acoustics
Faces
Electronic mail
Speaker verification
speaker embedding
contrastive learning
disentangled representation learning
variational autoencoder
Language
ISSN: 2329-9290
2329-9304

Online Access

초록

Contrastive self-supervised learning has been widely used in speaker embedding to address the labeling challenge. Contrastive speaker embedding assumes that the contrast between the positive and negative pairs of speech segments is attributed to speaker identity only. However, this assumption is incorrect because speech signals contain not only speaker identity but also linguistic content. In this paper, we propose a contrastive learning framework with sequential disentanglement to remove linguistic content by incorporating a disentangled sequential variational autoencoder (DSVAE) into the conventional contrastive learning framework. The DSVAE aims to disentangle speaker factors from content factors in an embedding space so that the speaker factors become the main contributor to the contrastive loss. Because content factors have been removed from contrastive learning, the resulting speaker embeddings will be content-invariant. The learned embeddings are also robust to language mismatch. It is shown that the proposed method consistently outperforms the conventional contrastive speaker embedding on the VoxCeleb1 and CN-Celeb datasets. This finding suggests that applying sequential disentanglement is beneficial to learning speaker-discriminative embeddings.

공지

DAU Library

eArticles

요약정보

Contrastive Self-Supervised Speaker Embedding With Sequential Disentanglement

Online Access

초록