Exploration and exploitation are pivot components of Q-learning, and a balance between the two components is crucial toward efficient Q-learning procedures. This paper considers Q-learning for the task of linear quadratic regulation (LQR) for unknown systems. To avoid an aggressive or conservative exploration or exploitation of Q-learning that leads to failed LQR, e.g., large system overshoot and turn-off effect, we propose a novel approach where the two components are balanced in an adaptive way. Particularly, we first take into account the estimation error of Q function optimization for Q-learning to restrain the exploration, which otherwise can be aggressive under the optimization principle of certainty-equivalence as is in previous studies. Then, to balance the exploration and exploitation, we quantify the two components by formulating two objective functions representing the interests of exploration and exploitation. We combine the two functions together as a two-objective optimization problem, which we solved via the bi-criterial method and the solution can serve the regulating signal with balanced exploration and exploitation for the LQR task. Numerical experiments are conducted, and the results demonstrate that the proposed approach can bring a robust and stable LQR in systems with significant uncertainty.