Most of the typical reinforcement learning algorithms help wireless devices choose the security policy such as the moving strategy and communication policy by exploring all the possible state-action pairs including the risky policies that cause a severe collision or network disaster. In this paper, we design a safe reinforcement learning algorithm for safety-critical applications (e.g., intelligent transportation systems) to guide the learning agent to avoid exploring risky policies. This algorithm uses Q-network (i.e., a convolutional neural network or a deep neural network) to choose the policy and designs a safety guide to modify the chosen policy that results in dangerous status. More specifically, the safety guide includes a risk alarm module that evaluates the immediate warning value corresponding to the risk of each state-action pair and a G-network that estimates the long-term risk value. By adding the long-term risk value and the long-term expected reward output by the Q-network, this algorithm uses a safety dock to modify the chosen policy. This algorithm uses the immediate warning value to formulate a safe buffer and a risky buffer for the G-network updating to ensure fully exploration in the initial learning process. As a case study, we apply the designed algorithm in a cargo transportation system, in which the experimental results verify the effectiveness of our algorithm compared with the benchmark safe deep Q-network.