Facial expression recognition is an area of growing interest in computer vision research. Achieving a balance between learning subtle facial features and reducing memory consumption presents a persistent challenge, resulting in hardly to deploy model on portable mobile devices restricted by hardware resources. To overcome this issue, we propose a spatial attention module on lightweight backbone, named the cross-direction attention network (CDAN). The proposed model achieves enhanced fine-grained feature learning and used less memory consumption through cross-learning in both vertical and horizontal directions without complicated artificial feature technology. This study proposes a novel spatial attention model, which contributes an innovative approach for future research in this field. Based on the experimental outcomes, it can be concluded that the proposed model achieves state-of-the-art performance on the AffectNet dataset.