Short-Term Load Forecasting (STLF) aims to predict the future power load of a single household, supporting various downstream home management applications. Recently, deep learning models have become popular for STLF due to their proficiency in extracting nonlinear patterns from time series data. However, the increasing complexity of these models hinders their deployment in local household settings. To address this, we introduce a knowledge distillation-based approach in STLF, enabling the transfer of knowledge from a large model to a smaller one without compromising forecast accuracy. The proposed method employs a teacher-student network for data distillation, where a sophisticated ‘teacher’ network conveys forecasting insights to a small ‘student’ network, facilitating its use on local devices with limited resources. We conducted experiments and comparative studies to validate this approach, demonstrating its effectiveness while ensuring deployment feasibility and accuracy.