The Procedure of Adapting the Design Parameters of the Convolutional Neural Network During the Speaker's Emotions Recognition
- Resource Type
- Conference
- Authors
- Tereikovskyi, Ihor; Mussiraliyeva, Shynar; Tereikovska, Liudmyla; Chernyshev, Denys; Nyussupov, Adlet; Abaiuly, Yerulan
- Source
- 2022 International Conference on Smart Information Systems and Technologies (SIST) Smart Information Systems and Technologies (SIST), 2022 International Conference on. :1-6 Apr, 2022
- Subject
- Communication, Networking and Broadcast Technologies
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Training
Emotion recognition
Technological innovation
Neural networks
Semantics
Information security
Structural engineering
recognition of emotions
speech signal mel-frequency cepstral coefficient
speaker
neural network model
convolutional neural network
adaptation procedure
information security
- Language
Solving the challenge of improving the recognition system of the speaker's emotions due to the implementation of modern neural networks that are based on convolutional neural networks is the subject of this article. Difficulties of such implementation are identified and there is connection with the need of adapting the convolutional neural network to conditions of given recognition problem. The functional network parameters of adaptation procedure are provided, that implies determination of input and output field parameters, type, values of structural settings and mathematical basics of the convolutional neural networks. Procedure is based on projection of a mel-frequency cepstral coefficient for every quasi-stationary fragment of voice signals with fixed size in the form of mono-chrome square picture. The conducted research experiments showed that the provided procedure allows to build the convolutional neural networks that has speaker's emotion recognition accuracy at the level of better modern recognition systems. Further researches will be connected to the development of the method that is based on emotional neural networks in free texts voicing cases. This will help to identify and prevent possible threats that come from fake audio and video information. The level of overall information security can be increased using aforementioned technologies.