학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

A Distributed Algorithm for Sequential Decision Making in Multi-Armed Bandit with Homogeneous Rewards

Resource Type: Conference
Authors: Zhu, Jingxuan; Sandhu, Romeil; Liu, Ji
Source: 2020 59th IEEE Conference on Decision and Control (CDC) Decision and Control (CDC), 2020 59th IEEE Conference on. :3078-3083 Dec, 2020
Subject: Robotics and Control Systems
Distributed algorithms
Upper bound
Decision making
Symmetric matrices
Random variables
Indexes
Hospitals
Language
ISSN: 2576-2370

Online Access

Full Text (IEEE)

초록

This paper studies a distributed multi-armed bandit problem over a network of N agents, each of which can communicate only with its neighbors, where neighbor relationships are described by a connected graph $\mathbb{G}$ . Each agent makes a sequence of decisions on selecting an arm from M candidates, yet it only has access to local samples of the reward for each action, which is a random variable. A distributed upper confidence bound (UCB) algorithm is proposed for the agents to cooperatively learn the best decision. It is shown that when all the agents share a homogeneous distribution of each arm reward, the algorithm achieves guaranteed logarithmic regret for all N agents at the order of O((1 + 2ρ 2 ) 2 logT/N) when T is large, where ρ 2 denotes the second largest among the absolute values of all the eigenvalues of the Metropolis matrix of $\mathbb{G}$. A sufficient condition under which the proposed distributed algorithm learns faster than the centralized (single-agent) counterpart is provided. Simulations suggest that the algorithm also works for the case when the agents have heterogeneous observations of each arm reward.

공지

DAU Library

학술논문

요약정보

A Distributed Algorithm for Sequential Decision Making in Multi-Armed Bandit with Homogeneous Rewards

Online Access

초록