The feedback flow control has the potential to significantly improve the performance of fluid machinery. However, due to the nonlinearity of the flow, it is difficult to construct the control law analytically. In previous studies, the feedback flow control system utilizing Deep Reinforcement Learning is successfully applied to the flow separation control over an airfoil in wind tunnel experiment to suppress the detached flow. In this study, the effect of penalty proportional to the applied voltage in the reward for the learning is investigated, and the control policy achieving higher energy efficiency is successfully obtained. The results show that variation of the penalty amplitude changes the control policy obtained in the learning. In the case of small penalty, the system successfully achieves the suppression of flow separation by lower voltage continuously. On the other hand, when the penalty is increased, the system periodically turns the Plasma Actuator on and off for reducing the consumption power.