Multi-depot vehicle routing problem with time windows (MDVRPTW) is a valuable practical issue in urban logistics. However, heuristic methods may fail to generate high-quality solutions for massive problems instantly. Thus, this article presents a novel reinforcement learning algorithm integrated with a multi-head attention mechanism and a local search strategy to solve the problem efficiently. The routing optimization was regarded as a vehicle tour generation process and an encoder-decoder was used to generate routes for vehicles departing from different depots iteratively. A multi-head attention strategy was employed for mining complex spatiotemporal correlations within time windows in the encoder. Then, a decoder with multi-agent was designed to generate solutions by optimizing reward and observing transition state. Meanwhile, a local search strategy was employed to improve the quality of solutions. The experiments results demonstrate that the proposed method can significantly outperform traditional methods in effectiveness and robustness.