eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

End-to-end audio-scene classification from raw audio: Multi time-frequency resolution CNN architecture for efficient representation learning

Resource Type: Conference
Authors: Kumar, T. Vijaya; Sundar, R. Shunmuga; Purohit, Tilak; Ramasubramanian, V.
Source: 2020 International Conference on Signal Processing and Communications (SPCOM) Signal Processing and Communications (SPCOM), 2020 International Conference on. :1-5 Jul, 2020
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Kernel
Time-frequency analysis
Signal resolution
Convolution
Task analysis
Spectrogram
Image resolution
Language
ISSN: 2474-915X

Online Access

Full Text (IEEE)

초록

We propose and study a novel multi-temporal CNN architecture for end-to-end ‘audio-scene classification’ (ASC) from raw audio signal. Conventional CNNs use a fixed size kernel (whether for image or 1-d signal classification) which corresponds to applying a filter bank, where each filter has a fixed time-frequency resolution (i.e., fixed duration impulse response and a fixed band-width frequency response), importantly with a specific time-frequency trade-off. In contrast, in a way to allow for multiple time-frequency resolutions, we use a multi-temporal CNN architecture having multiple kernel branches (up to 12 branches) each of different lengths, thereby allowing for multiple filter banks with different time-frequency resolution to process the input raw audio signal and create feature-maps (e.g. ranging from very narrow-band to very wide-band spectrographic maps in steps of fine time-frequency resolutions) corresponding to different time-frequency trade-offs. Applying this architecture to end-to-end audio-scene classification is shown to offer consistent and significant performance enhancements (e.g. 11-15% absolute in accuracy for the multi-temporal case of 12 branches) over the conventional single-temporal CNN and also outperform state-of the-art results for this task.

공지

DAU Library

eArticles

요약정보

End-to-end audio-scene classification from raw audio: Multi time-frequency resolution CNN architecture for efficient representation learning

Online Access

초록