학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

What Do Audio Transformers Hear? Probing Their Representations For Language Delivery & Structure

Resource Type: Conference
Authors: Singla, Yaman Kumar; Shah, Jui; Chen, Changyou; Shah, Rajiv Ratn
Source: 2022 IEEE International Conference on Data Mining Workshops (ICDMW) ICDMW Data Mining Workshops (ICDMW), 2022 IEEE International Conference on. :910-925 Nov, 2022
Subject: Computing and Processing
Bit error rate
Semantics
Syntactics
Transformers
Feature extraction
Data models
Natural language processing
Interpretability
wav2vec2.0
Audio Transformers
Language Delivery
Language Structure
Language
ISSN: 2375-9259

Online Access

Full Text (IEEE)

초록

Transformer models across multiple domains such as natural language processing and speech form an unavoidable part of the tech stack of practitioners and researchers alike. Au-dio transformers that exploit representational learning to train on unlabeled speech have recently been used for tasks from speaker verification to discourse-coherence with much success. However, little is known about what these models learn and represent in the high-dimensional latent space. In this paper, we interpret two such recent state-of-the-art models, wav2vec2.0 and Mockingjay, on linguistic and acoustic features. We probe each of their layers to understand what it is learning and at the same time, we draw a distinction between the two models. By comparing their performance across a wide variety of settings including native, non-native, read and spontaneous speeches, we also show how much these models are able to learn transferable features. Our results show that the models are capable of significantly capturing a wide range of characteristics such as audio, fluency, supraseg-mental pronunciation, and even syntactic and semantic text-based characteristics. For each category of characteristics, we identify a learning pattern for each framework and conclude which model and which layer of that model is better for a specific category of feature to choose for feature extraction for downstream tasks.

공지

DAU Library

학술논문

요약정보

What Do Audio Transformers Hear? Probing Their Representations For Language Delivery & Structure

Online Access

초록