Detecting pupil from the image is critical in human-machine interaction and biomedical computing applications, which is supposed to be an actual image segmentation problem. Recently developed deep learning models provide a variety of novel approaches to the pupil segmentation task. However, dataset preparation and annotation acquirement to build pupil image datasets are labor-intensive and time-consuming. The shortage of labeled samples restricted the improvement of deep learning models. In this work, we use a mask image modeling mechanism to learn the latent representation from limited data samples, which significantly helps train deep models. Further, we propose a novel pupil segmentation model based on the recently proposed Swin-Transformer to validate the improvement validity of the mask mechanism. The proposed computational framework achieves better performance on the pupil segmentation tasks based on the LPW dataset through comparison experiments with other related deep learning models. The proposed framework is a promising solution for pupil segmentation and detection in small-sample learning applications.