The manipulator is challenging to perform multistep tasks, such as block stacking. Universal Option Framework (UOF) has a high success rate in completing multistep tasks. The Deep Deterministic Policy Gradients (DDPG) algorithm based on Hindsight Experience Replay (HER) in UOF trains low-level policy. However, it has a low success rate, long convergence time, and low training efficiency for complex tasks. To solve the above problems, this article proposes the UOF algorithm for improved DDPG and the UOF algorithm for improved HER. The improved DDPG algorithm suggests the soft double estimation softmax operator based on UOF and introduces it into the DDPG algorithm. By improving the deviation of the DDPG algorithm, the success rate of complex tasks is increased, and the task convergence time is significantly shortened; By introducing curiosity and proximity into the improved HER algorithm, the failure experience of each learning stage is effectively utilized to solve further the sparse reward problem, which significantly enhances the training efficiency and the success rate of complex tasks. Experiments on block stacking tasks show that the improved method can effectively complete complex tasks.