With the continuous advancement of computer network communication technology, traditional wired and wireless networks are limited by cables and base stations, and are not applicable in some application scenarios. Therefore, mobile wireless communication methods have attracted more and more attention. Due to its dynamic topology and self-organizing without center, the self-organizing network can form a mobile temporary multi-hop mobile communication network through multiple wireless communication devices, which can well meet the above requirements. At present, research on self-organizing networks mainly focuses on routing protocols. The main types are based on network topology information and location information. This paper designs a routing algorithm based on link reliability. The algorithm fully considers the self-organizing network link information, and models the node's sending and receiving work as a Markov decision process, and uses Q-learning to demodulate. This article used NS2 for network simulation and analysis and comparison of performance indicators with traditional routing algorithms.