For the slow training of the Panda robotic arm grasping and placing task in a third-party environment in the Gym simulation environment, it is proposed to use the TQC algorithm for training. Compared with DDPG algorithm and SAC algorithm, the training speed is significantly improved. This algorithm has high performance when faced with a rudimentary reward function, in this experiment the reward function is only set to give 0 for grasping an object and gives a negative reward otherwise. This rudimentary reward function makes all actions equally negatively rewarded, and when the strategy is updated, there is no information about which action is better, and therefore the strategy is not improved, thus making it more difficult to explore the rewards for success. When it comes to solving the Gym-Panda robotic arm grasping task using the TQC (Truncated Quantile Critics) algorithm, it is first necessary to understand the TQC algorithm as well as the fundamentals of the Panda-Gym robotic arm simulation environment. After understanding the algorithms, the TQC, DDPG, and SAC algorithms are used in the same environment to train on the same task environment, and the superiority of the TQC algorithm is demonstrated by comparing the data curves.