We conducted investigation into an effective scheduling method of the exploration in a reinforcement learning algorithm, aiming at the control of a flapping unmanned aerial vehicle (UAV) we have developed. Deep Q Network (DQN) algorithm was employed to determine optimal gain parameters of PID control of the Yaw angle of the airframe. Although the Yaw angle can be stabilized by this PID-DQN hybrid method, we noticed that the gain parameters tend to be biased toward highly rated values in the early stages of the learning. In this study, we solved this problem by modifiying the scheduling of epsilon-greedy method in DQN.