Improving ASR Performance with OCR Through Using Word Frequency Difference
- Resource Type
- Conference
- Authors
- Jung, Kyudan; Bae, Seungmin; Kim, Nam Joon; Ryu, Hyun Gon; Lee, Hyuk-Jae
- Source
- 2024 International Conference on Electronics, Information, and Communication (ICEIC) Electronics, Information, and Communication (ICEIC), 2024 International Conference on. :1-4 Jan, 2024
- Subject
- Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Terminology
Text recognition
Error analysis
Optical character recognition
Frequency conversion
Real-time systems
Grammar
Automatic Speech Recognition
Optical Character Recognition
English Word Frequency
Lecture Specific Terminology
- Language
- ISSN
- 2767-7699
Recently, there has been a growing interest in conversational artificial intelligence (AI). As a result, research is actively being conducted on automatic speech recognition (ASR) to facilitate interactions between humans and machines. This paper proposes a system that enhances ASR performance. The proposed method accumulates images captured from lecture videos in real-time every 30 seconds. The frequency ratios between text data from captured images and text data calculated offline from over 333K are used to improve the ASR performance. Experimental results showed that the word error rate (WER) decreased by a maximum of 0.68% compared to using only the traditional ASR. Especially, the recognition rate for specialized terms frequently used in lectures showed an improvement of 64%.