As application migration to the cloud becomes the mainstream way of application deployment, application runtime management presents a significant need for large-scale workload prediction technology. However, existing large-scale workload forecasting models focus more on improving model accuracy and ignore the models’ storage, training time, and testing time, which leads to colossal overhead. Therefore, this paper proposes a large-scale workload forecasting model for containers. First, based on the workload value features and waveform features, a feature-enhanced workload similarity calculation algorithm is proposed to determine the grouping of containers with similar workload patterns in real time by analyzing the historical similarity and recent similarity of workloads among different containers; second, we employ Transformer as the base model to design position encoding and attention mask based on the real-time workload similarity relationship and achieve forecasting model parallelized training based on the multi-head self-attention mechanism, which balances the workload prediction accuracy and model overhead. Finally, we will validate the comprehensive advantages of our model in terms of accuracy and overhead based on public datasets and verify the effectiveness of each subpart of our model through ablation experiments.