Currently, most whole slide images classification models rely on manual pixel-level annotations, which requires specific domain experts to annotate that is delicate and time-consuming. To overcome this problem, we propose to combine self-supervised learning with multiple instance learning to deal with large WSIs datasets only with the reported diagnoses as labels. In WSIs classification task, it’s a key challenge to learn good image representation, where self-supervised learning has held tremendous potential. In our study, we propose to use self-supervised learning network Bootstrap Your Own Latent as the pre-trained network, which can be trained using unlabeled data and learn the deep domain-specific features. We evaluated our proposed framework at scale on a uterine cervical dataset of 3,063 whole slide images(720GB). Our results have shown that the combination of self-supervised learning model and multiple instance learning model can match and exceed the performance of former methods.