학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Fine-grained Text-Video Fusion for Referring Video Object Segmentation

Resource Type: Conference
Authors: Li, Yonglin; Wan, Jiaqi; Teng, Xiao; Lan, Long
Source: 2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA) Power, Electronics and Computer Applications (ICPECA), 2024 IEEE 4th International Conference on. :788-793 Jan, 2024
Subject: Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Visualization
Image segmentation
Fuses
Motion segmentation
Computational modeling
Semantics
Object segmentation
Referring video object segmentation
fine grained fusion
large language model
Language

Online Access

Full Text (IEEE)

초록

Referring video object segmentation (RVOS) aims at segmenting an object in a video with its text description. The core of RVOS lies in the modal alignment between the vision and text. To improve the performance, most previous RVOS methods are devoted to exploring more sophisticated visual cues. However, these methods fail to fully exert the inherent structured properties of text and only use coarse text features to tackle vision-text interaction. In this paper, we propose FTVR, an approach that utilizes Fine-grain Text-Video method for Referring video object segmentation to explore fine-grain text information. We introduce a LLM to split the text sentence into different functional phrases and propose three novel modules to enhance the cross-modal alignment. Concretely, we design a dynamic-aware perception module to deal with motion-related phrases and a following global-aware attention module to fuse the outputted motion information. To deal with entity-related phrases, our FTVR also introduces an entity-aware augmentation module to highlight entity information. Extensive experiments show the effectiveness of our method on four popular benchmarks.

공지

DAU Library

학술논문

요약정보

Fine-grained Text-Video Fusion for Referring Video Object Segmentation

Online Access

초록