Jamming strategy optimization is an important task for cognitive jammer in the dynamic electromagnetic environment. To jam the frequency agile radar, the paper proposed a reinforcement learning method for optimal jamming strategy selection, where the jamming strategy include the jamming frequency and the jamming pulse duration. The jamming effect which is related with the jamming-to-signal ratio (JSR) as well as the jamming power consumption are considered as the reward of the jammer. An intercept receiver is supposed to present to capture the radar signal and estimate the jamming effect. The model free Q-learning algorithm, which can converge to the optimized policy with probability 1, is used to solve the problem. Simulation result show that the optimal jamming strategy can have a comparatively good result which greatly improves the total reward in a pulse train. Almost all the pulses are being jammed after several rounds of interaction with the radar.