In this study, a new 3-stage approach that consists of clustering, simulation, and optimization stages is proposed for the simulation of groundwater level (GWL) in an arid region of eastern Iran. In the first stage, K-means clustering was used to divide the study aquifer into five different clusters based on precipitation, water recharge, water discharge, transmissivity, earth level, and water table. In the second stage, to simulate GWL in each cluster, several input variables, such as water level at the previous month, aquifer discharge, aquifer recharge, evaporation, temperature, and precipitation, were used in the form of various input patterns that were fed to an artificial neural network (ANN). Finally, in the third stage, two advanced optimization methods, i.e., particle swarm optimization (PSO) and whale optimization algorithm (WOA), were utilized to optimize the ANN results. Various patterns were identified as suitable clusters based on the studied models. A pattern including water level at the previous month, aquifer discharge, aquifer recharge, and precipitation was identified as the best model for four clusters, except for cluster 3. The validation with root mean squared error (RMSE), mean absolute percentage error (MAPE), and Nash Sutcliffe index (NSE) revealed RMSE = 0.01, NSE = 0.97, and MAPE = 0.13 for the first cluster, RMSE = 0.011, NSE = 0.99, and MAPE = 0.22 for the second cluster, RMSE = 0.003, NSE = 0.99, and MAPE = 0.30 for the fourth cluster, and RMSE = 0.001, NSE = 0.98, and MAPE = 0.05 for the fifth cluster. For the third cluster, a pattern including water level at the previous month, aquifer discharge, and aquifer recharge was identified as the best model resulting in RMSE = 0.006, NSE = 0.99, and MAPE = 0.05. Finally, according to the results, the ANN–PSO model was applied to three clusters, while the ANN–WOA model was applied to the remaining clusters. In general, this study showed that optimization algorithms can improve the simulation accuracy of ANN, and the efficient use of each method depends on the clustering type. The application of the approach proposed here can be extended to other aquifers that have a relatively large area and limited data availability.