학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

End-to-End Chinese Lip-Reading Recognition Based on Multi-modal Fusion

Resource Type: Conference
Authors: Liu, Yixian; Lin, Chuoya; Wang, Mingchen; Liang, Simin; Chen, Zhuohui; Chen, Ling
Source: 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC) Frontiers Technology of Information and Computer (ICFTIC), 2022 4th International Conference on. :794-801 Dec, 2022
Subject: Computing and Processing
Robotics and Control Systems
Measurement
Lips
Computational modeling
Auditory system
Speech recognition
Learning (artificial intelligence)
Feature extraction
Audio-visual speech recognition
EALRA
Hearing impairment
Multimodal fusion
Language

Online Access

Full Text (IEEE)

초록

With around 1.5 billion people worldwide suffering from hearing impairment, it is particularly important to communicate between non-disabled people and people with hearing or speech impairment and to build a barrier-free society. Multi-modal learning provides an excellent artificial intelligence channel for this purpose. In this article, we create an End-to-end Chinese Lip-Reading Recognition System based on multi-modal fusion to implement Chinese lip translation in order to facilitate communication between individuals with hearing impairment. Our system adopts the End-to-end Audio-visual feature fusion Lip-reading Recognition Architecture (EALRA), with feature extraction based on a MobileNet0.25 tuned CNN skeleton and the encoder back-end using the Conformer self-attentive convolution encoder for modelling. The largest Chinese Mandarin Lip-Reading (CMLR) was selected as the dataset for the empirical study, and the performance metric for Chinese lip recognition was the character error rate (CER). The results of our experiments show that the CER metric of EALRA in the lip-recognition model is 8.0, which is on average 23.74% lower than the CER metrics of other lip-recognition models, indicating that EALRA performs better in fusing image features and audio features.

공지

DAU Library

학술논문

요약정보

End-to-End Chinese Lip-Reading Recognition Based on Multi-modal Fusion

Online Access

초록