학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

USV path following controller based on DDPG with composite state-space and dynamic reward function.

Resource Type: Article
Authors: Zhong, Weibo; Li, Haodong; Meng, Yizhen; Yang, Xiaofei; Feng, Youbing; Ye, Hui; Liu, Wei
Source: Ocean Engineering. Dec2022:Part 1, Vol. 266, pN.PAG-N.PAG. 1p.
Subject: *REINFORCEMENT learning
*HYSTERESIS
Language
ISSN: 0029-8018

Online Access

초록

Reinforcement learning (RL) is suitable for the design of unmanned surface vessels (USV) path-following controllers for its model-free and unsupervised learning. However, the USV state transition is not strictly consistent with Markov properties and the state-space, action-space, reward function strongly affect the performance of the RL-based controller. According to the dynamics and kinematic characteristics of USV, we design a new state-space to reduce the influence of large inertia and state hysteresis on the RL agent training. A comprehensive reward function is proposed to avoid the RL-based controller falling into local optimum according to the task decomposing. A dynamic threshold is used in the reward function to accelerate the training speed while ensuring tracking accuracy. Finally, the effectiveness of the proposed RL-based controller is evaluated by means of simulation and actual USV path following. • According to the USV dynamics and kinematic characteristics, we design a new state-space to make state transitions Markovian and to reduce the influence of large inertia and state hysteresis on training. • By decomposing the path following task into primary task and secondary tasks, a comprehensive reward function is proposed to make the RL-based agents converge quickly and avoid falling into local optimum. • A dynamic threshold is used in the reward function to avoid the sparse rewards and accelerate the training speed while ensuring tracking accuracy. [ABSTRACT FROM AUTHOR]

공지

DAU Library

학술논문

요약정보

USV path following controller based on DDPG with composite state-space and dynamic reward function.

Online Access

초록