Few-shot learning-based facial expression recognition (FER) aims to achieve maximum efficiency from a few numbers of data. Therefore, it is significant to utilize the given training dataset efficiently. However, the existing few-shot FERs are too dependent on the datasets they trained, so it is challenging to generalize the FER performance. To address the problem, we propose a Channel Selective Relation Network with channel selection module and spatial data construction to train optimal features. Our method helps the network to prevent irrelevant information and focus on essential information by comparing the original sample features with the averaged feature. Furthermore, our network efficiently learns dominant facial expression features in local patches, such as the eyes and lips. Compared to the current state-of-the-art method, the average performances on RAFDB, FER2013, SFEW, and AFEW datasets are improved by 3.5%, 4.44%, 5.58%, and 2.31% in accuracy, respectively.