학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Visual-Linguistic Co-Understanding Network for Image Captioning

Resource Type: Conference
Authors: Xin, Lu; Zhang, Chunyuan
Source: 2023 IEEE Smart World Congress (SWC) Smart World Congress (SWC), 2023 IEEE. :1-9 Aug, 2023
Subject: Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Image Captioning
Visual-Linguistic Co-Understanding
Semantic understanding
Language

Online Access

Full Text (IEEE)

초록

The key to effective image captioning lies in extracting rich semantic information from the image. However, most existing approaches rely on pre-trained classification models or object detectors to extract this information, which may not fully capture the semantic relationships within the image and can result in limited image captioning performance. To address this issue, we propose a Visual-Linguistic Co-Understanding Network (VLCU-Net) for Image Captioning based on the Transformer architecture. Our approach combines the semantic ranking process with richer image semantic understanding information in an integrated framework. Specifically, we begin by querying sentences related to the semantic information of each image, and then extract the semantic words through a text-image understanding extractor. Simultaneously, we infer words related to the semantic words. We feed all the words obtained into a semantic words sorter to arrange them in a linguistic order. Finally, we combine the resulting ordered semantic word expression sequences with image features to generate captions. Our proposed approach outperforms state-of-the-art methods on both metrics and manual evaluations on the COCO benchmark, as evidenced by extensive experimental results.

공지

DAU Library

학술논문

요약정보

Visual-Linguistic Co-Understanding Network for Image Captioning

Online Access

초록