학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos

Resource Type: Conference
Authors: Feng, Zishun; Tu, Ming; xia, Rui; Wang, Yuxuan; Krishnamurthy, Ashok
Source: 2020 IEEE International Conference on Big Data (Big Data) Big Data (Big Data), 2020 IEEE International Conference on. :5671-5672 Dec, 2020
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
Geoscience
Signal Processing and Analysis
Training
Visualization
Conferences
Data visualization
Big Data
Task analysis
Videos
self-supervised learning
multimodal representation learning
large scale video understanding
Language

Online Access

Full Text (IEEE)

초록

Humans understand videos from both the visual and audio aspects of the data. In this work, we present a self-supervised cross-modal representation approach for learning audio-visual correspondence (AVC) for videos in the wild. After the learning stage, we explore retrieval in both cross-modal and intra-modal manner with the learned representations. We verify our experimental results on the VGGSound dataset [1], and our approach achieves promising results.

공지

DAU Library

학술논문

요약정보

Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos

Online Access

초록