In this letter, we consider the problem of maximizing spectrum reuse in an underlay cognitive radio (CR) system where multiple secondary transmitters (STs) are communicating with the respective secondary receivers (SRs) in a device-to-device (D2D) communication fashion, and the CSI is not available at the STs. The problem at hand can not be solved using conventional optimization techniques proposed in the literature because they require CSIs to provide a solution. Hence, we propose a distributed deep reinforcement learning (DRL) framework for optimizing the resource allocation at each ST with only a single bit feedback from the respective SR. The simulations show that the proposed framework provides an excellent performance, where for small rate requirements the number of STs successfully communicating on the limited channels approaches the total number of STs in the system.