In recent years, deep reinforcement learning technology has been continuously applied in the field of cloud computing, making it possible to provide effective and optimized decision-making strategies for complex and high-dimensional multi-agent systems. However, in complex multi-agent scenarios, the existing multi-agent depth The reinforcement learning algorithm not only has a slow convergence speed, but also the stability of the algorithm cannot be guaranteed. In this paper, a multi-agent distributed deep deterministic policy gradient algorithm based on value distribution is proposed, which introduces the idea of value distribution into multi-agent scenarios and preserves expectations. Return complete distribution information, so that the agent can obtain more stable and effective learning signals; Introduce multi-step rewards to improve the stability of the algorithm; Introduce a distributed data generation framework to decouple experience data generation and network update, so that it can make full use of computing resources and speed up the convergence of the algorithm. Experiments show that the algorithm proposed in this paper has better stability and convergence speed in multiple continuous/discrete controlled multi-agent scenarios, and the decision-making ability of the agent has also been significantly improved. Enhance.