학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Speaker and Language Aware Training for End-to-End ASR

Resource Type: Conference
Authors: Bansal, Shubham; Malhotra, Karan; Ganapathy, Sriram
Source: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Automatic Speech Recognition and Understanding Workshop (ASRU), 2019 IEEE. :494-501 Dec, 2019
Subject: Signal Processing and Analysis
Training
Mathematical model
Decoding
Hidden Markov models
Adaptation models
Data models
Task analysis
End-to-end speech recognition
Language modeling
Speaker adaptation
Language

Online Access

Full Text (IEEE)

초록

The end-to-end (E2E) approach to automatic speech recognition (ASR) is a simplified and an elegant approach where a single deep neural network model directly converts the acoustic feature sequence to the text sequence. The current approach to end-to-end ASR uses the neural network model (trained with sequence loss) along with an external character/word based language model (LM) in a decoding pass to output the text sequence. In this work, we propose a new objective function for end-to-end ASR training where the LM score is explicitly introduced in the attention model loss function without any additional training parameters. In this manner, the neural network is made LM aware and this simplifies the model training process. We also propose to incorporate an attention based sequence summary feature in the ASR model which allows the system to be speaker aware. With several E2E ASR experiments on TED-LIUM, WSJ and Librispeech datasets, we show that the proposed speaker and LM aware training improves the ASR performance significantly over the state-of-art E2E approaches. We achieve the best published results reported for WSJ dataset.

공지

DAU Library

학술논문

요약정보

Speaker and Language Aware Training for End-to-End ASR

Online Access

초록