Identify, Locate and Separate: Audio-Visual Object Extraction in Large Video Collections Using Weak Supervision
- Resource Type
- Conference
- Authors
- Parekh, Sanjeel; Ozerov, Alexey; Essid, Slim; Duong, Ngoc Q. K.; Perez, Patrick; Richard, Gael
- Source
- 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019 IEEE Workshop on. :268-272 Oct, 2019
- Subject
- Signal Processing and Analysis
Training
Visualization
Instruments
Signal processing
Proposals
Noise measurement
Videos
Audio-visual event detection
source separation
non-negative matrix factorization
multiple instance learning
- Language
- ISSN
- 1947-1629
We tackle the problem of audio-visual scene analysis for weakly-labeled data. To this end, we build upon our previous audio-visual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results.