학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Look, Listen and Pay More Attention: Fusing Multi-Modal Information for Video Violence Detection

Resource Type: Conference
Authors: Wei, Dong-Lai; Liu, Chen-Geng; Liu, Yang; Liu, Jing; Zhu, Xiao-Guang; Zeng, Xin-Hua
Source: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022 - 2022 IEEE International Conference on. :1980-1984 May, 2022
Subject: Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Computer vision
Data analysis
Conferences
Supervised learning
Signal processing
Acoustics
Task analysis
Violence detection
multi-modal information
fused attention
deep learning
weak supervision
Language
ISSN: 2379-190X

Online Access

Full Text (IEEE)

초록

Violence detection is an essential and challenging problem in the computer vision community. Most existing works focus on single modal data analysis, which is not effective when multi-modality is available. Therefore, we propose a two-stage multi-modal information fusion method for violence detection: 1) the first stage adopts multiple instance learning strategies to refine video-level hard labels into clip-level soft labels, and 2) the next stage uses multi-modal information fused attention module to achieve fusion, and supervised learning is carried out using the soft labels generated at the first stage. Extensive empirical evidence on the XD-Violence dataset shows that our method outperforms the state-of-the-art methods.

공지

DAU Library

학술논문

요약정보

Look, Listen and Pay More Attention: Fusing Multi-Modal Information for Video Violence Detection

Online Access

초록