Emotional awareness has been the subject of research in the field of Human-Computer Interaction in recent years. Computers have become an integral part of human life. Users need human-like interactions to better communicate with computers. Many researchers have done so be interested in emotional recognition and classification using a variety of sources. For recognizing emotions, we proposed a machine learning and deep learning-based approach namely CNLSTM in video expressions. The hybrid approach is employed to dissect the video data and perform recognition of emotions. The CNN extracts the features from the corresponding frames in the video and LSTM is used to explore the temporal dynamics of the video. To validate the proposed approach, two datasets namely CK+, JAFFE, and Aff-wild2 have been taken. Different experiments have been performed to validate the results of the proposed approach. The experimental results show that our proposed approach achieves accuracy in the angry mode of the CK+ dataset. Also, the proposed model shows better results in unbalanced data.