With the rise in societal pressures, depression and anxiety have increasingly become prominent mental health conditions impacting people’s lives. To enhance the efficacy of automatic detection for these disorders, we have developed an experimental framework called the Voluntary Facial Expression Mimicry(VFEM). This framework led to the creation of the VFEM Dataset, which supports related research endeavors. Subsequently, we introduce the LI-FPN designed specifically for the automatic identification of depression and anxiety disorders. The LI-FPN comprises two core components: the Learning and Imitation Module(LIM) and the Spatio-temporal Feature Pyramid Network(STFPN). Within the LIM, we leverage sequence features to facilitate comprehensive feature extraction through learning and imitation steps. The STFPN is designed to focus on outliers in multi-scale features for further screening. Compared with traditional attention methods, LI-FPN is more suitable for processing sequence data features and small sample datasets. Upon training using the VFEM Dataset, the LI-FPN achieves impressive accuracies: 0.850 for depression detection, 0.835 for anxiety detection, and 0.786 for co-occurrence detection of depression and anxiety. Meanwhile, LI-FPN also achieves SOAT results on AVEC2014 dataset. The source code for LI-FPN is accessible at https://github.com/muzixingyun/LI-FPN