Time Series Classification (TSC) is a crucial area in machine learning. Although applications of Deep Neural Networks (DNNs) in this area have led to relatively good results, classifying this kind of data is a major challenge. This issue is due to the nature of time-series as it involves high data volume, unfavorable elements such as noise, inconsistency, and missing values. Shallow approaches solve these challenges using temporal discretization. Despite offering significant advantages, deep learning (DL) models do not apply temporal discretization in an end-to-end manner. This paper develops three end-to-end DL models, namely FCN-DISC, LSTM-FCN-DISC, and ALSTM-FCN-DISC, to integrate the benefits of temporal discretization and deep network architecture. The proposed models attempt to select the values of the input time-series that play a more effective role in model training by embedding temporal discretization in deep network architecture. These models use two loss functions to construct a discretized time series and optimize network weights. Thus, a new loss function for discretization was also introduced in addition to the cross-entropy. Experiments on univariate TSC datasets demonstrate that the proposed models, in most cases, outperform the state-of-the-art methods.