Autonomous learning of robotic skills can allow general-purpose robots to learn wide behavioral repertoires without extensive manual engineering. However, robotic skill learning must typically make trade-offs to enable practical real-world learning, such as requiring manually designed policy or value function representations, initialization from human demonstrations, instrumentation of the training environment, or extremely long training times. We propose a new reinforcement learning algorithm that can train general-purpose neural network policies with minimal human engineering, while still allowing for fast, efficient learning in stochastic environments. We build on the guided policy search (GPS) algorithm, which transforms the reinforcement learning problem into supervised learning from a computational teacher (without human demonstrations). In contrast to prior GPS methods, which require a consistent set of initial states to which the system must be reset after each episode, our approach can handle random initial states, allowing it to be used even when deterministic resets are impossible. We compare our method to existing policy search algorithms in simulation, showing that it can train high-dimensional neural network policies with the same sample efficiency as prior GPS methods, and can learn policies directly from image pixels. We also present real-world robot results that show that our method can learn manipulation policies with visual features and random initial states.