eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

Multimodal Urban Scene Understanding

Resource Type
Authors: Rajsuryan Singh
Source
Subject: visual sound source localization
urbansas
self-supervised learning
optical flow
Language: English

Online Access

Open Access (OpenAIRE)

초록

Early computational approaches for sound source localization, originating in robotics,were modeled after animal perception and utilized audiovisual synchrony and spatialinformation inferred from multichannel audio. More recent deep learning-based methods focus on learning semantic audiovisual representations in a self-supervisedmanner and using them for localizing sounding objects. A majority of these approachesby design exclude information that comes from the temporal context that a videoprovides. While that is not a hurdle for widely used benchmark datasets because of thebias towards having large single objects in the middle of the image, the methods fallshort on more challenging scenarios like urban traffic videos. This thesis aims toexplore methods to introduce temporal context into the state-of-the-art methods forsound source localization in urban scenes. Optical flow is used as a means to encodemotion information. An analysis of the strengths and weaknesses of our methods helpsus better understand the problem of visual sound source localization and sheds new lighton the characteristics of our dataset.

공지

DAU Library

eArticles

요약정보

Multimodal Urban Scene Understanding

Online Access

초록