Crack detection plays an important role in the maintenance and protection of steel box girder of bridges. However, since the cracks only occupy an extremely small region of the high-resolution images captured from actual conditions, the existing methods cannot deal with this kind of image effectively. To solve this problem, this paper proposed a novel three-stage method based on deep learning technology and morphology operations. The training set and test set used in this paper are composed of 360 images (4928 × 3264 pixels) in steel girder box. The first stage of the proposed model converted highresolution images into sub-images by using patch-based method and located the region of cracks by CBAM ResNet-50 model. The Recall reaches 0.95 on the test set. The second stage of our method uses the Attention U-Net model to get the accurate geometric edges of cracks based on results in the first stage. The IoU of the segmentation model implemented in this stage attains 0.48. In the third stage of the model, we remove the wrong-predicted isolated points in the predicted results through dilate operation and outlier elimination algorithm. The IoU of test set ascends to 0.70 after this stage. Ablation experiments are conducted to optimize the parameters and further promote the accuracy of the proposed method. The result shows that: (1) the best patch size of sub-images is 1024 × 1024. (2) the CBAM ResNet-50 and the Attention U-Net achieved the best results in the first and the second stage, respectively. (3) Pre-training the model of the first two stages can improve the IoU by 2.9%. In general, our method is of great significance for crack detection.