Huge processing power, big data, and deep learning algorithms have made major technical improvements possible. Technology is developing to analyse, predict, and meet our unmet requirements. Speech is the primary human communication method, and when a microphone sensor is employed, it may be utilised to support human-computer interaction (HCI). A new field of study in human-computer interaction (HCI) is quantifying emotion recognition from speech signals using these sensors. This research has applications in virtual reality, healthcare, emergency call centres, behaviour assessment, and human-reboot interaction. Our main contributions in this study are to: (i) further develop discourse feeling acknowledgment (SER) exactness over the cutting edge; what's more (ii) bring down the figuring cost of the proposed SER model. The method involved with recognizing a speaker's feelings from their discourse signal is known as discourse feeling acknowledgment. The three essential strides in the feeling acknowledgment process are highlight extraction, include determination, and classifier. Improving a framework's capacity to perceive discourse feelings using different element extraction techniques is the essential objective of this work. The work centers around the pre-handling of the sound examples that are gotten, where channels are utilized to wipe out commotion from discourse tests. Worldwide discriminative elements are learned in completely associated layers, though neighborhood stowed away examples are learned in convolutional layers that bring additional consideration to down-example the component maps rather than the pooling layer. Discourse feeling classification is finished utilizing a SoftMax classifier. It shows the worth and viability of the proposed SER approach and exhibits that it is so relevant to functional purposes.