The rise of smart grid technologies has enabled a new era of two-way communication between customers and power companies, empowering the implementation of effective Demand Response Management. This approach benefits both businesses and consumers while improving utility operations. As smart grid energy consumption monitoring becomes more prevalent, the number of networked sensors and the volume of data generated continues to increase. However, this data’s sheer volume and complexity pose significant challenges for embedded platforms due to their limited memory, computational capacity, and power resources. Any discrepancy between energy supply and demand could drive up costs for service providers and customers and cause potential system failures. It is, therefore, imperative to develop effective methodologies for analyzing the data to optimize smart grid energy consumption monitoring. To address these difficulties, this research paper investigates several machine-learning strategies. Using a publicly available dataset from five separate aggregators, we have applied four well-known techniques: Transformer, LSTM, BiLSTM, and Prophet. The study demonstrates that the transformer-based framework outperforms the other three algorithms in forecasting short-term load for the aggregator’s demand response.