In wireless communication networks including unmanned aerial vehicles (UAVs), joint communication and radar (JCR) using a single waveform for both communication and sensing functions has been considered. In the JCR, the power allocation to the pilot and data parts can be optimized in terms of communication and sensing performance metrics. Furthermore, to serve ground users effectively, the location of UAVs, which receive the transmit signal from a base-station (BS) and forward to ground users, should be optimized. In multi-UAV environments, the optimization of signal power and UAV's position becomes too complicated to solve with a conventional optimization framework. Therefore, a reinforcement learning approach, i.e., multi-agent Q-learning, is adopted to optimize the UAV-assisted JCR networks.