eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

Cross-Modal Learning for CTC-Based ASR: Leveraging CTC-Bertscore and Sequence-Level Training

Resource Type: Conference
Authors: Lee, Mun-Hak; Lee, Sang-Eon; Choi, Ji-Eun; Chang, Joon-Hyuk
Source: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Automatic Speech Recognition and Understanding Workshop (ASRU), 2023 IEEE. :1-8 Dec, 2023
Subject: Signal Processing and Analysis
Training
Conferences
Machine learning
Brain modeling
Linear programming
Data models
Biological neural networks
Speech recognition
Connectionist temporal classification
BERT
Cross-modal learning
Language

Online Access

Full Text (IEEE)

초록

Due to the nature of neural networks that easily overfit the training set, neural network-based speech recognition models are vulnerable to prior shifts in data distribution or unseen words. Therefore, studies have been conducted to over-come this problem by using language models trained with a relatively easy-to-obtain unpaired corpus. In this paper, we present a new training method that uses BERT to improve the performance of a connectionist temporal classification (CTC)-based ASR model. The proposed method follows a cross-modal learning scenario and induces the CTC model to better embed contextual information by utilizing an auxiliary objective function operating at the sequence level. We applied the proposed method to fine-tune the pre-trained wav2vec 2.0 model with CTC loss and confirmed that the proposed method improves the generalization performance of the ASR model.

공지

DAU Library

eArticles

요약정보

Cross-Modal Learning for CTC-Based ASR: Leveraging CTC-Bertscore and Sequence-Level Training

Online Access

초록