In the process of clinical diagnosis, radiologists are mainly to analyze the obtained medical images and yield corresponding disease diagnosis reports. In order to reduce workload and improve efficiency while generating high-quality diagnostic reports, we proposed a novel method to achieve medical image captioning based on conditional generative adversarial nets, in which language evaluator is introduced to align language styles. The proposed method consists of three modules, i.e., diagnostic report generator, discriminator and language style evaluator. Specially, diagnostic report generator which contains feature extraction, attention mechanism and report output modules is used to generate diagnostic reports which is desired to appropriate clinical reports. Then, RNN-based discriminator is used to determine the authenticity between the generated and true diagnostic report. Besides, language style evaluator is adopted to keep the style consistence between the generated and real diagnostic report. Additionally, we use reinforcement learning mechanism to yield high-quality diagnosis report and overcome discretization problem in adversarial training. Extensive experiments show that our proposed method gains significant improvement over other methods in medical image caption. Objectively, ours achieves average increments of 10.15% BLEU-1 and 13.06% BLEU-2 on Open-i. A similar trend can be found on LGK. Additionally, ours outperforms than other comparisons language style evaluation by clinical radiologists subjectively.