In the personnel management knowledge graph, the absence of relations will lead to an incomplete knowledge graph, which affects downstream applications. While existing approaches can handle relational reasoning tasks, many neglect reasoning efficiency and implicit semantic information in multi-hop relational paths. Therefore, we proposed a multi-hop relational reasoning model based on deep reinforcement learning to reason out paths with implicit semantic information. We optimize the proximal policy optimization algorithm to implement reasoning. The irrelevant actions masking scheme makes the reasoning more efficient. And the reward shaping make the reasoned multi-hop relational paths semantically interpretable. We construct a formal dataset containing a large number of triples to represent the personnel management knowledge graph. By comparing the experimental results on metrics about MAP, MRR, Hits@k, and frequency of paths with other baselines, our proposed model is more advantageous in the personnel management knowledge graph. In addition, experimental results on the dataset NELL-995 show that our proposed model's performance can compare to the mainstream methods.