Confocal Laser Endomicroscopy (CLE) has shown great advantages in the diagnosis of gastrointestinal diseases. To solve the problems of time-consuming manual classification of CLE video information frames and insufficient labeled data, we proposed a Spatial-Temporal Fusion Pseudo-Labeling method (STFPL) based on semi-supervised learning. Firstly, the classification networks trained with limited labeled data are used to generate the predictions of unlabeled images and selected videos. Secondly, the predictions of images and videos are fused to obtain pseudo-labels. Thirdly, the unlabeled loss formed by predictions and pseudo-labels and the loss of labeled data are combined to update the classification networks. Finally, the experimental results demonstrated that STFPL outperforms other semi-supervised algorithms on the CLE video dataset. In addition, STFPL can achieve the effectiveness of supervised classification on a dataset for evaluating the quality of intestinal cleaning, Nerthus dataset.