Scene text recognition is considered as a sequence labeling problem. For the text recognition task, the alignment between the scene text image and the output text is coincident, which means the latter characters corresponding to the image region will also be behind. However, the existing global attention-based method focuses too much irrelevant information which leads to alignment drift. Contrary, local attention selects the subset of feature representation most relevant to the current character. In this paper, we explore the local attention mechanism and attempt to replace the global attention to implement decoding. Therefore, we revise several variants of local attention methods and provide a comprehensive comparison, which is missing in the scene text recognition literature so far. Specially, we introduce two Heuristic approaches for Local Attention (HLA) and prove that monotonic alignment improves performance significantly. Evaluations on the benchmarks show that the local attention method outperforms the existing global attention methods.