Deep Reinforcement Learning (DRL) is a deep learning (DL) network model that uses environmental feedback to train and make decisions. Expected value, as a powerful mathematical tool, is widely used in DRL network training. However, there are often deviations between the expected values and the actual values obtained from the environment. Additionally, accumulative deviations are also present in DRL networks. The accumulation of these deviations can result in a slower training speed and negatively impact the network's stability. To address these issues, this paper proposes a new DRL training method called Punisher. The principle behind Punisher is to identify the bad actions made by the DRL agent and correct only those actions during the training process. By focusing on correcting the bad actions, Punisher aims to improve the overall performance and stability of the DRL network. The experimental results demonstrate that the Punisher method exhibits excellent performance, faster training speeds and greater network stability, making it a promising approach for efficiently training DRL agents in various applications. Experiments presented in the main content of this paper have been posted at GitHub at https://github.com/Jimmyoungyi/Punisher.