eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

基于场景图感知的跨模态图像描述模型 / Scene graph-aware cross-modal image captioning model

Resource Type: Academic Journal
Authors: 朱志平; 杨燕; 王杰; ZHU Zhiping; YANG Yan; WANG Jie
Source: 计算机应用 / Journal of Computer Applications. 44(1):58-64
Subject: 图像描述
场景图
注意力机制
长短期记忆网络
特征融合
image captioning
scene graph
attention mechanism
Long Short-Term Memory(LSTM)Network
feature fusion
Language: Chinese
ISSN: 1001-9081

Online Access

Find it @ DONGA

초록

针对图像描述方法中对图像文本信息的遗忘及利用不充分问题,提出了基于场景图感知的跨模态交互网络(SGC-Net).首先,使用场景图作为图像的视觉特征并使用图卷积网络(GCN)进行特征融合,从而使图像的视觉特征和文本特征位于同一特征空间;其次,保存模型生成的文本序列,并添加对应的位置信息作为图像的文本特征,以解决单层长短期记忆(LSTM)网络导致的文本特征丢失的问题;最后,使用自注意力机制提取出重要的图像信息和文本信息后并对它们进行融合,以解决对图像信息过分依赖以及对文本信息利用不足的问题.在Flickr30K和MS-COCO(MicroSoft Common Objects in COntext)数据集上进行实验的结果表明,与Sub-GC相比,SGC-Net在BLEU1(BiLingual Evaluation Understudy with 1-gram)、BLEU4(BiLingual Evaluation Understudy with 4-grams)、METEOR(Metric for Evaluation of Translation with Explicit ORdering)、ROUGE(Recall-Oriented Understudy for Gisting Evaluation)和SPICE(Semantic Propositional Image Caption Evaluation)指标上分别提升了1.1、0.9、0.3、0.7、0.4和0.3、0.1、0.3、0.5、0.6.可见,SGC-Net所使用的方法能够有效提升模型的图像描述性能及生成描述的流畅度.
Aiming at the forgetting and underutilization of the text information of image in image captioning methods,a Scene Graph-aware Cross-modal Network(SGC-Net)was proposed.Firstly,the scene graph was utilized as the image's visual features,and the Graph Convolutional Network(GCN)was utilized for feature fusion,so that the visual and textual features were in the same feature space.Then,the text sequence generated by the model was stored,and the corresponding position information was added as the textual features of the image,so as to solve the problem of text feature loss brought by the single-layer Long Short-Term Memory(LSTM)Network.Finally,to address the issue of over dependence on image information and underuse of text information,the self-attention mechanism was utilized to extract significant image information and text information and fuse then.Experimental results on Flickr30K and MS-COCO(MicroSoft Common Objects in COntext)datasets demonstrate that SGC-Net outperforms Sub-GC on the indicators BLEU1(BiLingual Evaluation Understudy with 1-gram),BLEU4(BiLingual Evaluation Understudy with 4-grams),METEOR(Metric for Evaluation of Translation with Explicit ORdering),ROUGE(Recall-Oriented Understudy for Gisting Evaluation)and SPICE(Semantic Propositional Image Caption Evaluation)with the improvements of 1.1,0.9,0.3,0.7,0.4 and 0.3,0.1,0.3,0.5,0.6,respectively.It can be seen that the method used by SGC-Net can increase the model's image captioning performance and the fluency of the generated description effectively.

공지

DAU Library

eArticles

요약정보

基于场景图感知的跨模态图像描述模型 / Scene graph-aware cross-modal image captioning model

Online Access

초록