학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Fusing Depths: Investigating the Synergy of Convolutional Neural Networks and Long Short-Term Memory Networks for Enhanced Image Caption Generation

Resource Type: Conference
Authors: Sharma, Vikas; Alekh; Chaudhary, Kirti; Vashishth, Tarun Kumar; Chaudhary, Sachin; Kumar, Bhupendra
Source: 2024 International Conference on Cognitive Robotics and Intelligent Systems (ICC - ROBINS) Cognitive Robotics and Intelligent Systems (ICC - ROBINS), 2024 International Conference on. :103-107 Apr, 2024
Subject: Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Visualization
Multimedia systems
Semantics
Refining
Feature extraction
Natural language processing
Convolutional neural networks
Long Short-Term Memory (LSTM) Networks
Synergistic Approach
Enhanced Image Description
Visual Feature Extraction
Language

Online Access

Full Text (IEEE)

초록

Generation, poses a challenge due to the intricate visual content and nuanced semantic details. This research introduces a novel approach for image captioning by seamlessly integrating Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. The proposed methodology involves leveraging CNNs to extract visual features and employing LSTM for the generation of descriptive captions. To further enhance the model's performance, attention mechanisms are incorporated, allowing the model to focus on relevant visual features during the caption generation process. The evaluation of our modified model utilizes standard benchmark datasets, such as Flickr8k, and employs metrics including CIDEr, METEOR, and BLEU scores. Through rigorous assessment, our model showcases improved performance, demonstrating its efficacy in comparison to existing methods. The versatility of the proposed system extends its potential applications to image retrieval, image description, and other multimedia scenarios requiring robust image analysis and natural language processing capabilities. This research contributes to the advancement of image captioning techniques, offering a promising solution for real-world applications in multimedia and artificial intelligence domains. The goal of this study is to explore and leverage the combined capabilities of CNNs and LSTMs to enhance the process of generating descriptive captions for images. By merging the strengths of CNNs in image feature extraction with the sequential understanding and context modeling abilities of LSTMs, the aim is to develop a more sophisticated and effective approach for generating accurate and contextually relevant captions that better capture the nuances and details of the images. This research seeks to push the boundaries of image captioning technology, ultimately improving the quality and richness of generated captions, and advancing the state-of-the-art in artificial intelligence and computer vision applications.

공지

DAU Library

학술논문

요약정보

Fusing Depths: Investigating the Synergy of Convolutional Neural Networks and Long Short-Term Memory Networks for Enhanced Image Caption Generation

Online Access

초록