Optimizing the traffic control system is of great significance for improving people’s livelihood and promoting economic development. To realize an intelligent traffic control system that is different from the fixed timing scheme based on historical traffic flow and the adjustment of green signal ratio, maddpg and qmix reinforcement learning methods are tried to be applied to two different four intersection traffic network files with restricted traffic routes and the whole route. The total reward value of the algorithm is compared; the stability of the two methods applied to the traffic control system is evaluated by the growth speed of the reward items, the reduction speed of the penalty items and the convergence speed of the algorithm. Finally, through two groups of comparative experiments, the convergence speed of maddpg is slightly slower than that of qmix in the case of the whole route and the restricted route, but the total reward is significantly higher than that of qmix algorithm. At the same time, the growth rate of maddpg in reward items and the decline rate of punishment items are faster than qmix. Through the comparison of the two methods in different road environments, it is found that maddpg is more suitable for optimizing traffic control. Optimizing the traffic control system is of great significance for improving people’s livelihood and promoting economic development. To realize an intelligent traffic control system that is different from the fixed timing scheme based on historical traffic flow and the adjustment of green signal ratio, maddpg and qmix reinforcement learning methods are tried to be applied to two different four intersection traffic network files with restricted traffic routes and the whole route. The total reward value of the algorithm is compared; the stability of the two methods applied to the traffic control system is evaluated by the growth speed of the reward items, the reduction speed of the penalty items and the convergence speed of the algorithm. Finally, through two groups of comparative experiments, the convergence speed of maddpg is slightly slower than that of qmix in the case of the whole route and the restricted route, but the total reward is significantly higher than that of qmix algorithm; at the same time, the growth rate of maddpg in reward items and the decline rate of punishment items are faster than qmix. Through the comparison of the two methods in different road environments, it is found that maddpg is more suitable for optimizing traffic control.