학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Voice detection And Speech Translation Using Ensemble Inception v3 And Lstm

Resource Type: Conference
Authors: Deepthi, Tummula Navya; Vaishnavi, Nalam Khyathi; Rasika, Babar; Kumari, Parchuri Seetha; Sri, Tummala Kameswari Naga Sai
Source: 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI) Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), 2024 IEEE International Conference on. 2:1-6 Mar, 2024
Subject: Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Deep learning
Recurrent neural networks
Speech recognition
Predictive models
Feature extraction
Natural language processing
Acoustics
Speech-to-text
Speech Translation
Voice Detection
Ensemble Inception v3
LSTM
Language

Online Access

Full Text (IEEE)

초록

Deep learning approaches have significantly advanced fundamental tasks in natural language processing (NLP), such as text-to-speech and speech-to-text translation. In this study, we combine Inception V3 and LSTM (Long Short-Term Memory) neural networks to propose an ensemble technique for text-to-speech and speech-to-text translation. Inception V3 is the most advanced convolutional neural network (CNN) available architecture known for its ability to extract rich and meaningful features from images. By adapting Inception V3 for acoustic feature extraction in speech signals, we exploit its capability to capture relevant information for speech-to-text translation. We train the Inception V3 model on a large dataset of speech samples, allowing it to learn discriminative acoustic patterns. To handle the sequential nature of text-to-speech translation, we incorporate LSTM is a kind of RNN (recurrent neural network). into our ensemble. LSTM excels at capturing long-term connections in sequences, which makes it ideal for using textual input to produce speech that sounds natural and cohesive. The LSTM model is trained on a parallel corpus of text and corresponding speech signals, leveraging the power of sequential modelling combine the predictions of the Inception V3 and LSTM models through an ensemble approach, where the outputs of both models are averaged or fused to generate the final translations. This ensemble strategy allows us to exploit the complementary strengths of both CNNs and RNNs, leading to improved translation accuracy and quality. Experiments conducted on benchmark datasets show that our ensemble method for text-to-speech and speech-to-text translation is efficient. The ensemble achieves significant gains in accuracy and fluency compared to individual models.Furthermore, our approach shows robustness in handling different languages and input variations. In conclusion, our proposed ensemble of Inception V3 and LSTM models presents a powerful solution for translation from text to voice and from speech to text. By leveraging the strengths of CNNs and RNNs, we achieve superior performance in accurately transcribing speech and generating natural-sounding speech from text.

공지

DAU Library

학술논문

요약정보

Voice detection And Speech Translation Using Ensemble Inception v3 And Lstm

Online Access

초록