Aiming at the problems of underwater acoustic networks (UANs), such as low bandwidth, long propagation delay, high bit error rate, restricted energy, "void area" and the limitations of traditional routing protocols that focus solely on the current state of sensor nodes, lacking the flexibility for comprehensive network control, a Layer and Reinforcement Learning based Routing Protocol for underwater sensor networks(LRLRP) is proposed. The LRLRP protocol uses packet header information to learn or update the layer and the information of its neighbors. Subsequently, using information from the neighbor table regarding the layer, remaining energy, density, and Q-value of neighboring nodes, the reinforcement learning system is employed to determine the next-hop forwarding node. Extensive experiments were conducted by the NS3 network simulator to evaluate the performance of the LRLRP protocol. The simulation results demonstrate that the RLHRP routing protocol not only effectively mitigates the "void area" issue, but also exhibits commendable performance in terms of energy consumption, end-to-end delay, and packet delivery rate.