학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

MAAS: Multi-modal Assignation for Active Speaker Detection

Resource Type: Conference
Authors: Alcazar, Juan Leon; Heilbron, Fabian Caba; Thabet, Ali K.; Ghanem, Bernard
Source: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) ICCV Computer Vision (ICCV), 2021 IEEE/CVF International Conference on. :265-274 Oct, 2021
Subject: Computing and Processing
Visualization
Computer vision
Benchmark testing
Feature extraction
Data structures
Task analysis
Vision + other modalities
Video analysis and understanding
Language
ISSN: 2380-7504

Online Access

Full Text (IEEE)

초록

Active speaker detection requires a mindful integration of multi-modal cues. Current methods focus on modeling and fusing short-term audiovisual features for individual speakers, often at frame level. We present a novel approach to active speaker detection that directly addresses the multi-modal nature of the problem and provides a straightforward strategy, where independent visual features (speakers) in the scene are assigned to a previously detected speech event. Our experiments show that a small graph data structure built from local information can approximate an instantaneous audio-visual assignment problem. More-over, the temporal extension of this initial graph achieves a new state-of-the-art performance on the AVA-ActiveSpeaker dataset with a mAP of 88.8%.

공지

DAU Library

학술논문

요약정보

MAAS: Multi-modal Assignation for Active Speaker Detection

Online Access

초록