학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC)

Resource Type: Conference
Authors: Soky, Kak; Mimura, Masato; Kawahara, Tatsuya; Li, Sheng; Ding, Chenchen; Chu, Chenhui; Sam, Sethserey
Source: 2021 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), 2021 24th Conference of the. :122-127 Nov, 2021
Subject: Computing and Processing
Signal Processing and Analysis
TV
Databases
Machine translation
Speech processing
Task analysis
Automatic speech recognition
Khmer language
low-resource language
spoken language translation corpus
court dataset
Language
ISSN: 2472-7695

Online Access

Full Text (IEEE)

초록

Speech translation (ST) is a subject of rapidly increasing interest in the area of speech processing research. This interest is apparent from the increasing tools and corpora for this task. However, the lack of sufficient datasets is still the biggest challenge for under-resourced languages. Specifically, ST requires a large corpus of parallel speech, transcription, and translation text. In this work, we construct a large corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC), including simultaneous translation from Khmer into English and French. We also address the problem of sentence segmentation of Khmer by conducting a bilingual sentence alignment from English to Khmer with a monotonic assumption. This corpus has approximately 155 hours of speech in length and 1.7M words of text. We also report the baseline results of automatic speech recognition (ASR), machine translation, and ST systems, which show reasonable performance.

공지

DAU Library

학술논문

요약정보

Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC)

Online Access

초록