Reinforcement learning has attracted much attention in the field of mechanical control, but it has rarely been applied in projects requiring high precision and short control cycle. In order to realize the high frequency and high precision control of multi axis cooperative complex mechanism, this paper decouples and virtualizes the control system of a structure consisting of four valves, infers commands and send control signals at 500Hz, and controls the displacement precisely. The author modified the Proximal Policy Optimization algorithm and made some improvements in the updating rules and training environment, such as batch advantage normalization, attenuation of learning rate, strategy entropy reward, speed control reward, and addition of noise. Finally, the performance of the algorithm exceeded that of PID, ray-PPO in the simulation.