Image processing-based water level detectors have promising practical application value in intelligent agriculture and early water logging alerts. However, water level recognition based on image processing faces illumination, shooting angle, and sediment contamination challenges. In addition, due to the influence of water surface reflection, it is not easy to extract the water level ruler (WLR) on the water surface accurately. This paper proposes a novel dual-attention CornerNet for WLR image extraction and CTransformer for WLR sequence recognition. First, a dual-attention mechanism to obtain the global information is introduced to better predict semantic segmentation feature maps and corner information. Then, asymmetric convolution Resnet-50 is used to extract multi-local information to effectively recognize inconsistent character sizes caused by different shooting angles of WLRs. Recently, the design of vision backbone using self-attention becomes an exciting topic. In this work, an improved CTransformer is designed to retain sufficient global context information and extract more differentiated features for sequence recognition via multi-head self-attention. Evaluation using our in-house dataset shows that the proposed framework achieves an F-score of 91.37 in the detection stage and the accuracy of human estimation error within 0.3 cm in the recognition stage is 95.37%, respectively. The proposed method is also evaluated on several benchmarks. Experiment results demonstrate that the method in this paper is superior to the existing methods. [ABSTRACT FROM AUTHOR]