In recent years, Handwritten Text Recognition (HTR) has attracted widespread attention due to its huge applications. HTR is the process of extracting handwritten text from an image and converting it into a digital form for machine operation. Nevertheless, due to the huge differences in personal writing and the various properties of handwritten characters in multiple languages, HTR is still a challenging open research problem, and robustness and adaptability require additional improvements. The existing approaches to solve the HTR problem are usually systems based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), which utilize the Connectionist Temporal Classification (CTC) objective function. However, many approaches based on attention sequence to sequence (Seq2Seq) have been proposed for the HTR task. The Seq2Seq based approaches are more flexible, suitable for the temporal nature of the text and can use different attention mechanisms to focus on the most relevant features of the input. In this paper, we provide extensive comparison of the current Deep Learning approaches for the task of HTR. Also, we outline the current problems that limits the effectiveness of these approaches.