Internet of vehicles (IoV) has been developed as a promising technology to improve road safety. However, resource management can be challenging in a congested traffic environment, which can affect the energy efficiency (EE) and spectrum efficiency (SE) in IoV networks. In this paper, we present a novel intelligent resource allocation approach based on deep reinforcement learning to maximize the weighted composite efficiency that incorporates the EE and SE metric subject to latency and reliability constraints of vehicle-to-vehicle (V2V) users. We employ Thompson sampling with double deep Q network to transform the objective function. Moreover, we present a probability-based learning approach to meet the quality of service requirements and to increase the learning ability of the proposed model. The simulation results indicate that the proposed approach maximizes the composite efficiency while satisfying the latency and reliability constraints of V2V users.