TLS protocol, as the most popular security protocol in modern network communications, has numerous different implementations. Despite continuous efforts by creators to maintain these complex codes, they are still plagued by errors and security vulnerabilities. The TLS handshake protocol, as one of the most sensitive and vulnerable protocols in the TLS protocol family, has seen limited research in terms of security testing. Existing testing tools like TLS-differ use black-box differential fuzz testing to test the TLS handshake protocol. However, due to the blindness of black-box fuzzing, the generated test cases have lower coverage, and the efficiency of discovering differences is also relatively low. This paper focuses on the TLS handshake protocol in the TLS protocol family and proposes a strategy for using gray-box differential fuzz testing to discover differences between different implementations in the TLS handshake process. It introduces the AR-SMAB model to guide the seed selection and mutation process in gray-box fuzz testing. We implemented our testing tool and compared it with TLS-differ and two other validated gray-box differential fuzz testing tools. Our experiments show that our testing tool is more efficient in discovering differences and cumulative coverage, improving by approximately 10% to 50% compared to other tools.