학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification

Resource Type: Conference
Authors: Gu, Yanmei; JingLi; JiayiZhou; Wang, Zhiming; Zhu, Huijia
Source: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Automatic Speech Recognition and Understanding Workshop (ASRU), 2023 IEEE. :1-8 Dec, 2023
Subject: Signal Processing and Analysis
Representation learning
Training
Measurement
Conferences
Self-supervised learning
Data models
Task analysis
Cover Song Identification
dual-encoder architecture
multi-modal representation learning
joint training
Language

Online Access

Full Text (IEEE)

초록

Cover Song Identification (CSI) is an important and challenging task in Music Information Retrieval (MIR). This paper focuses on investigating the multi-modal features of audio and text in the music domain and proposes two significant improvements to enhance the model performance for CSI. Firstly, our approach consists of a dual-encoder architecture that learns the embedding between the audio and corresponding song title information of music. Secondly, we propose a multi-modal representation learning strategy by jointly optimizing classification and metric learning losses in the audio modality, and contrastive learning loss in the audio-text modality. Experimental results demonstrate that our method efficiently learns more robust multi-modal representations for cover songs compared to a single audio encoder and achieves state-of-the-art results in CSI tasks.

공지

DAU Library

학술논문

요약정보

Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification

Online Access

초록