Due to the interference of blurry background, character occlusion, and low background contrast in the Cast Pipe character images, the detection accuracy and real-time performance cannot meet the requirements of industrial scenes. To address these issues, this paper presents an improvement based on the Cascade Mask RCNN. Firstly, the Channel Attention Mechanism Module (Squeeze-and-Excitation, SE) is embedded into the ResNet structure. By constructing a bottleneck structure with two fully connected layers, the normalized weights are applied to each channel feature to enhance the network representation ability. Secondly, a bottom-up structure is introduced after the Feature Pyramid Networks (FPN) to transmit the strong localization information from lower-level features to higher-level semantic features. Thirdly, an adaptive pooling layer is added to allocate the candidate regions generated by the Region Proposal Network (RPN) to different scale feature maps. Additionally, skip connections are introduced between the bottom-level features and the highest-level features of each stage to reduce model parameters while retaining global representation capabilities. The improved algorithm is tested on a cast pipe dataset, achieving an average detection accuracy (mAP) of 98.7%, which is 2.7% higher than that of Cascade Mask RCNN. The accuracy indicates that the improved performance is superior to the original algorithm.