As an efficient way for fault diagnosis, diagnostic policy is to build a test sequence to achieve specified fault diagnosis and accurate failure location with minimal test cost. Traditional diagnostic policy building methods are not suitable for unreliable and imperfect tests due to the uncertainty caused by various outside or inside interference in the real-world systems extensively. Therefore, in this paper, a novel method for diagnostic policy based on Q-learning is proposed to get the optimal policy of long run in the realistic systems. We construct the diagnostic policy model based on the framework of RL. Then the probability of false alarm or misdetection can be learnt from the environment not partial statistic data through Q-learning algorithm. And the optimal diagnostic policy with imperfect tests is obtained with high efficiency, also considering the test cost and information gain. The proposed method is illustrated by a real-world application. The effectiveness and feasibility of this method are also verified by comparison results. [ABSTRACT FROM AUTHOR]