학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

MIM: Lightweight Multi-Modal Interaction Model for Joint Video Moment Retrieval and Highlight Detection

Resource Type: Conference
Authors: Li, Jinyu; Zhang, Fuwei; Lin, Shujin; Zhou, Fan; Wang, Ruomei
Source: 2023 IEEE International Conference on Multimedia and Expo (ICME) ICME Multimedia and Expo (ICME), 2023 IEEE International Conference on. :1961-1966 Jul, 2023
Subject: Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Measurement
Visualization
Costs
Computational modeling
Natural languages
Streaming media
Feature extraction
Video moment retrieval
highlight detection
cross-modal attention
lightweight
Language
ISSN: 1945-788X

Online Access

Full Text (IEEE)

초록

Joint video moment retrieval and highlight detection aims to find the relevant moments and highlight clips in a video with natural language. It is an emerging task though its individual problems have been studied for a while. The current methods utilize transformer to interact between modals, which leads to a huge cost of parameters and computation in spite of great performance. To address this problem, we present a cross-modal attention mechanism to capture related features from different modalities in a few-parameter way. Furthermore, a lightweight multi-modal interaction model (MIM) is proposed to solve video moment retrieval and highlight detection jointly. In the case of greatly reducing the number of parameters, we achieve competitive performance and faster convergence speed compared to previous method. Extensive experiments on four datasets demonstrate the effectiveness of our method.

공지

DAU Library

학술논문

요약정보

MIM: Lightweight Multi-Modal Interaction Model for Joint Video Moment Retrieval and Highlight Detection

Online Access

초록