In this paper, the problem of facial expression is addressed, which contains two different stages: 1. Face detection, 2. Emotion Recognition. For the first stage, an MTCNN (Multi-Task Convolutional Neural Network) has been employed to accurately detect the boundaries of the face, with minimum residual margins. The second stage, leverages a ShuffleNet V2 architecture which can tradeoff between the accuracy and the speed of model running, based on the users’ conditions. The experimental results clearly Shows that our proposed model outperforms the state-of-the-art on FER 2013 dataset which has been provided by Kaggle.