This paper studies a distributed multi-armed bandit problem over a network of N agents, each of which can communicate only with its neighbors, where neighbor relationships are described by a connected graph $\mathbb{G}$ . Each agent makes a sequence of decisions on selecting an arm from M candidates, yet it only has access to local samples of the reward for each action, which is a random variable. A distributed upper confidence bound (UCB) algorithm is proposed for the agents to cooperatively learn the best decision. It is shown that when all the agents share a homogeneous distribution of each arm reward, the algorithm achieves guaranteed logarithmic regret for all N agents at the order of O((1 + 2ρ 2 ) 2 logT/N) when T is large, where ρ 2 denotes the second largest among the absolute values of all the eigenvalues of the Metropolis matrix of $\mathbb{G}$. A sufficient condition under which the proposed distributed algorithm learns faster than the centralized (single-agent) counterpart is provided. Simulations suggest that the algorithm also works for the case when the agents have heterogeneous observations of each arm reward.