Reinforcement learning algorithm usually only improves maneuver strategy by the strength and weakness of the Air combat situation, but ignores the basic air combat attack task, whether the missile hits the target or not, and the hybrid action space problem caused by discrete missile launch strategy and continuous maneuver strategy. In order to solve the problem, this paper designs a reinforcement learning method based on proximal policy optimization, In this method, two separate policy networks are used to solve the hybrid action space problem caused by the discrete missile launch action and the continuous maneuver action. Whether the missile hits the target is taken as the evaluation system, and the missile launch action and maneuver action are jointly modeled. Thus complete the air combat task from the situation occupation through maneuvering action to the missile launch action guiding the missile to destroy the target. Finally, the intelligence level of the generation strategy is verified by the simulation experiment of UAV 1 versus 1 air combat attack mission under different initial situations. The results show that the maneuvering strategy and missile launching strategy generated by this algorithm are reasonable and can complete the designed air combat task.