학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Local and Global Multimodal Interaction for Image Caption

Resource Type: Conference
Authors: Yu, Zhengfa; Fu, Kai; Jin, Heng; Bai, Jianan; Zhang, Hequn; Li, Yuanhao
Source: 2023 4th International Conference on Electronic Communication and Artificial Intelligence (ICECAI) Electronic Communication and Artificial Intelligence (ICECAI), 2023 4th International Conference on. :164-169 May, 2023
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Training
Visualization
Technological innovation
Image coding
Benchmark testing
Feature extraction
Transformers
image caption
transformer
features
alignment
Language

Online Access

Full Text (IEEE)

초록

Image captioning is a hot trend in the field of artificial intelligence researches currently, which allows a computer to read image information and generate corresponding text description. Although advanced methods have extracted and fused rich features for image encoding and constructed reliable transformer-based networks for cross-modal prediction, image captioning tasks still face many challenges such as redundant and time-consuming features, incomplete information in the generated sentences. In order to improve the presentation of the deep networks in captioning pipeline, we have designed a novel visual encoding structure to achieve local cross-modal alignment, whose features are also employed for global semantic alignment in our proposed captioning model. Our method has been evaluated on the standard image captioning benchmark and reached outstanding performance.

공지

DAU Library

학술논문

요약정보

Local and Global Multimodal Interaction for Image Caption

Online Access

초록