A Distributed Algorithm for Multi-Armed Bandit with Homogeneous Rewards over Directed Graphs
- Resource Type
- Conference
- Authors
- Zhu, Jingxuan; Liu, Ji
- Source
- 2021 American Control Conference (ACC) American Control Conference (ACC), 2021. :3038-3043 May, 2021
- Subject
- Aerospace
Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Knowledge engineering
Sufficient conditions
Decision making
Directed graphs
Random variables
Distributed algorithms
- Language
- ISSN
- 2378-5861
This paper studies a distributed multi-armed bandit problem over a network of $N$ agents whose neighbor relationships are described by a directed graph G. Each agent makes a sequence of decisions on selecting an arm from $M$ candidates, yet it only receives a sample of the reward for each decision, which is an unknown random variable. A distributed upper confidence bound (UCB) algorithm is proposed for the agents to cooperatively learn the best decision. It is shown that when all the agents share a homogeneous distribution of each arm reward and G is strongly connected, the algorithm achieves a guaranteed logarithmic regret for all $N$ agents. A sufficient condition under which the proposed distributed algorithm learns faster than the classical single-agent UCB1 is also provided.