학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Outside-Knowledge Visual Question Answering for Visual Impaired People

Resource Type: Conference
Authors: Sun, Yeqi; Si, Qingyi; Lin, Zheng; Wang, Weiping; Mei, Ting
Source: 2023 IEEE International Conference on Medical Artificial Intelligence (MedAI) MEDAI Medical Artificial Intelligence (MedAI), 2023 IEEE International Conference on. :49-54 Nov, 2023
Subject: Bioengineering
Computing and Processing
Training
Visualization
Roads
Training data
Question answering (information retrieval)
Object recognition
Task analysis
Outside-knowledge
Visual Question Answering
Knowledge Based Systems in General
Generative Models
Multi-modality
Assistive living
Visual Impaired People
Language

Online Access

Full Text (IEEE)

초록

Outside-knowledge Visual Question Answering (VQA) is a challenging and promising task with broad applications. It requires models to accumulate external knowledge, acquire cross-modality scene understanding, and develop reasoning capabilities. A good VQA system can serve as the “eyes” of visually impaired people, enabling them to see the world around them. For example, it can assist in reading road signs, identifying directions, and recognizing objects. However, most of the existing VQA systems are based on classification fashion, which is limited in its ability to handle answers that have not been encountered in the training set. In contrast, generative VQA systems would naturally perform better in real-world scenarios. Nonetheless, research on such methods is still in its initial stages. Furthermore, in real-world scenarios, visual question answering often requires the utilization of extensive outside knowledge. And existing methods are often based on explicit knowledge from fixed knowledge bases, which is difficult to provide sufficient knowledge. To address the above two issues, this paper proposes a novel VQA system based on encoder-decoder generative models that fuses both implicit multimodal knowledge and implicit textual knowledge. Its absolute improvement of at least 5.69% over a range of baselines on the OKVQA dataset verifies the effectiveness of the proposed method.

공지

DAU Library

학술논문

요약정보

Outside-Knowledge Visual Question Answering for Visual Impaired People

Online Access

초록