With the ever-increasing stochastic and dynamic behavior observed in today’s bulk power systems, securely and economically planning future operational scenarios that meet all reliability standards under uncertainties becomes a challenging computational task, which typically involves searching feasible and suboptimal solutions in a highly dimensional space via massive numerical simulations. This paper presents a novel approach to achieving this goal by adopting the state-of-the-art reinforcement learning algorithm, Soft Actor Critic (SAC). First, the optimization problem of finding feasible solutions under uncertainties is formulated as Markov Decision Process (MDP). Second, a general and flexible framework is developed to train SAC agent by adjusting generator active power outputs for searching feasible operating conditions. A software prototype is developed that verifies the effectiveness of the proposed approach via numerical studies conducted on the planning cases of the SGCC Zhejiang Electric Power Company.