학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A Multi-Layer Attention Network for Visual Commonsense Reasoning

Resource Type: Conference
Authors: Zhang, Wenqi; Gao, Yongchao; Qian, Heng; Lyu, Hongli
Source: 2022 5th International Conference on Data Science and Information Technology (DSIT) Data Science and Information Technology (DSIT), 2022 5th International Conference on. :1-6 Jul, 2022
Subject: General Topics for Engineers
Visualization
Correlation
Natural languages
Buildings
Data science
Task analysis
Information technology
visual commonsense reasoning
attention
multimodal
self-attention
Language

Online Access

Full Text (IEEE)

초록

Visual Commonsense Reasoning (VCR) is a challenging multimodal task involving several research fields such as vision, cognition, and reasoning, which combines images and natural language for reasoning. Existing VCR methods focus on global attention or use pre-training models, but these methods lack attention to local features of visual and language. In this paper, a multi-layer attention network is proposed for the VCR task, including an intra-modal attention module and an inter-modal attention module. The intra-modal attention module complements important features of visual and language modalities with fine-grained visual attention to improve the relevance of visual and language. The inter-modal attention module captures the internal dependencies between visual and language. Finally, the two modules are integrated into an end-to-end reasoning framework. Experiments on the VCR large-scale dataset show that the proposed method exhibits a decent improvement in the VCR task and illustrates the effectiveness of the method on three subtasks.

공지

DAU Library

학술논문

요약정보

A Multi-Layer Attention Network for Visual Commonsense Reasoning

Online Access

초록