This paper proposes a cognitive radar setup to learn the minimal sequence of Range-Doppler measurements for accurate multi-target detection with adaptive parameters. This minimal measurement sequence is achieved by a novel reward definition in a Reinforcement Learning approach. Thus, the cognitive radar learns to optimize its measurement time and energy savings. Based on Range-Doppler maps, the Reinforcement Learning agent adapts the FMCW parameters like bandwidth, sweep time, chirp repetition time and number of chirps to optimize the recognition in a three-target scenario. The agent is trained using Proximal Policy Optimization (PPO) in a simulated radar environment.