Optical coherence tomography (OCT) is a three-dimensional laminar imaging technique that has recently been applied to gynecologic cervical lesions and has clinically proven its superior diagnostic performance to colposcopy. However, most gynecologists are not familiar with this new imaging technique and require longer specialized training to perform accurate interpretation, so there is a great need for efficient computer-aided diagnostic systems. We aim to study a deep learning model based on self-supervised learning, which mainly takes a contrast learning approach, and combines an attention-transferring feature extraction method with a model named multi-layer feature refinement extraction with contrastive learning (FRCL) to improve the accuracy of feature extraction, in addition to using a CNN network as the backbone. Our dataset is OCT images of the uterine cervix from 733 patients in China, and we compare the classification accuracy of our proposed model with the existing state-of-the-art supervised networks, CNN-based self-supervised networks and find that the accuracy is higher, with the AUC of 0.9789±0.0098 for dichotomous classification, the specificity of 93.44±5.13 and the sensitivity of 91.38±2.62. In the test, our model was about the same as the average diagnosis of medical experts. Also, on the external dataset, our model plays a stable level as usual and has good results in feature extraction and lesion identification.