Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning Using Passive Langevin Dynamics
- Resource Type
- Conference
- Authors
- Snow, Luke; Krishnamurthy, Vikram
- Source
- 2023 62nd IEEE Conference on Decision and Control (CDC) Decision and Control (CDC), 2023 62nd IEEE Conference on. :3618-3625 Dec, 2023
- Subject
- Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Heuristic algorithms
Reinforcement learning
Markov processes
Cost function
Approximation algorithms
Real-time systems
Probability distribution
- Language
- ISSN
- 2576-2370
Stochastic gradient Langevin dynamics (SGLD) are a useful methodology for sampling from probability distributions. This paper provides a finite sample analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD) designed to achieve inverse reinforcement learning. By “passive”, we mean that the noisy gradients available to the PSGLD algorithm (inverse learning process) are evaluated at randomly chosen points by an external stochastic gradient algorithm (forward learner). The PSGLD algorithm acts as a randomized sampler which recovers the cost function being optimized by this external process. Previous work has analyzed the asymptotic performance of this passive algorithm using stochastic approximation techniques; in this work we analyze the non-asymptotic performance. Specifically, we provide finite-time bounds on the 2-Wasserstein distance between the passive algorithm and its stationary measure, from which the reconstructed cost function is obtained.