Learning robust policies for robotic systems operating in presence of uncertainty is a challenging task. For safe navigation, in addition to the natural stochasticity of the environment and vehicle dynamics, the perception uncertainty associated with dynamic entities, e.g. pedestrians, must be accounted for during motion planning. To this end, we construct an algorithm with built-in robustness to uncertainty by directly minimizing an upper confidence bound on the expected cost of trajectories instead of employing a standard approach based on minimizing the expected cost itself. Perception uncertainty is incorporated into the policy search framework by predicting each pedestrian’s intent belief and propagating their state distribution in time using closed-loop goal-directed dynamics. We train the policy in simulation and show that it could be transferred to an agile ground vehicle for successful autonomous robot navigation in presence of pedestrians with perception uncertainty. We further show the superior performance of this policy over a policy that does not consider pedestrian intent and perception uncertainty.