Face recognition for surveillance remains a complex challenge due to the disparity between low-resolution (LR) face images captured by surveillance cameras and the typically high-resolution (HR) face images in databases. To address this cross-resolution face recognition problem, we propose a two-stage dual-resolution face network to learn more robust resolution-invariant representations. In the first stage, we pre-train the proposed dual-resolution face network using solely HR images. Our network utilizes a two-branch structure and introduces bilateral connections to fuse the high- and low-resolution features extracted by two branches, respectively. In the second stage, we introduce the triplet loss as the fine-tuning loss function and design a training strategy that combines the triplet loss with competence-based curriculum learning. According to the competence function, the pre-trained model can train first from easy sample sets and gradually progress to more challenging ones. Our method achieves a remarkable face verification accuracy of 99.25% on the native cross-quality dataset SCFace and 99.71% on the high-quality dataset LFW. Moreover, our method also enhances the face verification accuracy on the native low-quality dataset.