Exploration Strategy based on Validity of Actions in Deep Reinforcement Learning
- Resource Type
- Conference
- Authors
- Yoon, Hyung-Suk; Lee, Sang-Hyun; Seo, Seung-Woo
- Source
- 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Intelligent Robots and Systems (IROS), 2020 IEEE/RSJ International Conference on. :6134-6139 Oct, 2020
- Subject
- Robotics and Control Systems
Training
Navigation
Reinforcement learning
Aerospace electronics
Task analysis
Autonomous vehicles
Intelligent robots
- Language
- ISSN
- 2153-0866
How to explore environments is one of the most critical factors for the performance of an agent in reinforcement learning. Conventional exploration strategies such as ε-greedy algorithm and Gaussian exploration noise simply depend on pure randomness. However, it is required for an agent to consider its training progress and long-term usefulness of actions to efficiently explore complex environments, which remains a major challenge in reinforcement learning. To address this challenge, we propose a novel exploration method that selects actions based on their validity. The key idea behind our method is to estimate the validity of actions by leveraging zero avoiding property of kullback-leibler divergence to comprehensively evaluate actions in terms of both exploration and exploitation. We also introduce a framework that allows an agent to explore efficiently in environments where reward is sparse or cannot be defined intuitively. The framework uses expert demonstrations to guide an agent to visit task-relevant state space by combining our exploration strategy with imitation learning. We demonstrate our exploration strategy on several tasks ranging from classical control tasks to high-dimensional urban autonomous driving scenarios at roundabout. The results show that our exploration strategy encourages an agent to visit task-relevant state space to enhance validity of actions, outperforming several previous methods.