eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

A High-Accuracy Two-Stage Model for Automatic Speech Recognition

Resource Type: Conference
Authors: Li, Xiong; Yan, Huabing
Source: 2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD) Artificial Intelligence and Big Data (ICAIBD), 2022 5th International Conference on. :438-443 May, 2022
Subject: Computing and Processing
Training
Deep learning
Error analysis
Text recognition
Neural networks
Training data
Speech recognition
speech recognition
convolution
self-attention
two-stage
Language

Online Access

Full Text (IEEE)

초록

Compared with the traditional multi-stage speech recognition model, the speech recognition model based on deep neural network has promising performance in accuracy and speed and can realize end-to-end speech conversion to text. However, its practical difficulties in training determine its performance limit. The method proposed in this paper is that combines the advantages of both by dividing the recognition task into two stages for processing. Both stages are carried out by a deep neural network model. First, the speech sequence is converted into a phoneme sequence, then the phoneme sequence is converted into a character sequence. The recognition process can be controlled more finely to achieve higher recognition accuracy with different loss functions and datasets. On the widely used open-source AiShell-1 Mandarin speech dataset, the acoustic model based on convolution in the first stage achieves a phonemic error rate of 1.90%, and the language model based on Bi-LSTM and self-attention in the second stage achieves a character accuracy of 99.4%. Finally, the speech recognition character error rate (CER) of the complete model with only 15M parameters is as low as 4.15%, achieving state-of-the-art accuracy.

공지

DAU Library

eArticles

요약정보

A High-Accuracy Two-Stage Model for Automatic Speech Recognition

Online Access

초록