DCHT: Deep Complex Hybrid Transformer for Speech Enhancement
- Resource Type
- Conference
- Authors
- Li, Jialu; Li, Junhui; Wang, Pu; Zhang, Youshan
- Source
- 2023 Third International Conference on Digital Data Processing (DDP) DDP Digital Data Processing (DDP), 2023 Third International Conference on. :117-122 Nov, 2023
- Subject
- Computing and Processing
Measurement
Neural networks
Speech enhancement
Transformers
Data processing
Data models
Spectrogram
complex deep neural network
speech enhancement
hybrid transformer
- Language
Most of the current deep learning-based approaches for speech enhancement only operate in the spectrogram or wave-form domain. Although a cross-domain transformer combining waveform- and spectrogram-domain inputs has been proposed, its performance can be further improved. In this paper, we present a novel deep complex hybrid transformer that integrates both spectrogram and waveform domains approaches to improve the performance of speech enhancement. The proposed model consists of two parts: a complex Swin-Unet in the spectrogram domain and a dual-path transformer network (DPTnet) in the waveform domain. We first construct a complex Swin-$V$ net network in the spectrogram domain and perform speech enhancement in the complex audio spectrum. We then introduce improved DPT by adding memory-compressed attention. Our model is capable of learning multi-domain features to reduce existing noise on different domains in a complementary way. The experimental results on the BirdSoundsDenoising dataset and the VCTK+DEMAND dataset indicate that our method can achieve better performance compared to state-of-the-art methods.