In this paper, we set up a two-player-zero-sum Markov game (TZMG) framework to train a safe driving policy network so that the worst intentions of the neighbor vehicles can be considered. Compared to the conventional policy learning frameworks, the TZMG framework can embed the adversary from the neighbor vehicle throughout its training process. Furthermore, a novel TZMG Q-learning algorithm based on the Wolpertinger policy is proposed to be scalable to multiple adversarial neighbor vehicles. Finally, simulations and humansin-the-loop experiments are conducted to verify the effectiveness of the TZMG framework and novel algorithm. Compared to the benchmarking safety controllers in the literature, our proposed novel TZMG algorithm can achieve a much lower collision rate when dealing with adversarial neighbor vehicles.