학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Reproducibility in Deep Reinforcement Learning with Maximum Entropy

Resource Type: Conference
Authors: Paleu, Tudor-Andrei; Pascal, Carlos
Source: 2023 27th International Conference on System Theory, Control and Computing (ICSTCC) System Theory, Control and Computing (ICSTCC), 2023 27th International Conference on. :428-433 Oct, 2023
Subject: Computing and Processing
Robotics and Control Systems
Space vehicles
Deep learning
Q-learning
Sensitivity
Moon
Entropy
Reproducibility of results
deep reinforcement learning
soft q-learning
maximum entropy
reproducibility
gymnasium
Language
ISSN: 2473-5698

Online Access

Full Text (IEEE)

초록

The latest work in the field of deep reinforcement learning speaks highly about the advanced exploration techniques which avoid the greedy decisions of agents. Usually, reinforcement learning works by finding the optimal policy for a Markov Decision Process. In off-policy algorithms the agent learns a value function for this optimal policy, separate of the action choice, an example being the deep Q-learning algorithm. Algorithms based on a maximum entropy framework, like soft Q-learning, overcome the greedy behavior of the agent, effectively combining exploration and exploitation by adding an entropy term to the Bellman equation. This method, applied to the Lunar Lander environment, was compared to the classic deep Q-learning, using the same set of different random seeds and averaging multiple runs. An implicit exploration strategy proves to compensate for disturbances caused by intrinsic sources of non-determinism, such as random seeds. This paper highlights the sensitivity to intrinsic and extrinsic influences for deep reinforcement learning, with respect to exploration and repeatability.

공지

DAU Library

학술논문

요약정보

Reproducibility in Deep Reinforcement Learning with Maximum Entropy

Online Access

초록