Trajectory deviation exists in the movement and obstacle avoidance of mechanical arm, which should be corrected based on appropriate control algorithm to ensure that the actual trajectory is close to the ideal trajectory. The trajectory planning and obstacle avoidance schemes based on the improved Q-learning algorithm were proposed, and the state vector set and the action set in each state were constructed respectively. BP neural network algorithm was used to improve the continuous approximation ability of the model, and Q function values were constantly updated during iteration. In path planning, according to the principle of minimum joint rotation Angle and minimum space travel distance of connecting rod, the trajectory deviation can be minimized while reasonable obstacle avoidance. The simulation results show that the proposed control algorithm has fast convergence speed, better path planning effect than the traditional planning scheme, and the lowest migration cost.