eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

A visual question answering method based on question intention

Resource Type: Conference
Authors: Wang, Kai; Pan, Yun; Yao, Xiang
Source: 2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP) Intelligent Computing and Signal Processing (ICSP), 2023 8th International Conference on. :1691-1694 Apr, 2023
Subject: Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Visualization
Computer vision
Image color analysis
Computational modeling
Semantics
Signal processing
Question answering (information retrieval)
component
Visual Question Answering
Multimodal learning
Intention Recognition
Language

Online Access

Full Text (IEEE)

초록

The input to a Visual Question Answering(VQA) is an image and a textual question related to the content of the image. The computer system needs to understand and process the image based on the input question and retrieve the answer to the question from the image. Since its emergence, VQA has naturally spanned two domains, Natural Language Processing (NLP) and Computer Vision (CV), and is a typical multimodal learning task. People often consider a variety of high-level semantic information when answering questions. For example, questions about color, type, quantity, purpose, etc. Although this information is critical to answering the VQA question, it is not directly available from the input data. Therefore, in this paper, we input the high-level semantic information of question intention into the model in the form of external knowledge, which is used to influence the process of multimodal information interaction and select the most appropriate features. The visual question answering method based on question intention designed in this paper was tested on the open datasets VQAv2, and the accuracy of the VQA is better than that of the baseline model.

공지

DAU Library

eArticles

요약정보

A visual question answering method based on question intention

Online Access

초록