In this research, we present a Q-learning based energy management system (DEQEMS) that is able to make decisions by using unique states and intuitive actions while maintaining a high degree of interpretability. The results of the experiments show that the DEQEMS reduces the number of days required for convergence to 633, with a mean absolute error (MAE) of supply distribution of 6.7%. This is a 63% and 71% reduction, respectively, compared to the conventional system, and a 34% and 21% reduction, respectively, compared to a state-of-the-art system. The experimental results demonstrate not only the usefulness and feasibility of the DEQEMS, but also its resilience with outstanding and consistent performance under a wide range of conditions.