eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

A Floyd-Warshall-Based Reoptimization of Q Matrix on the Single DVRPPD with On-Demand Cancellations

Resource Type: Conference
Authors: Catapang, Jasper Kyle; Solano, Geoffrey A.
Source: 2021 International Conference on Information and Communication Technology Convergence (ICTC) Information and Communication Technology Convergence (ICTC), 2021 International Conference on. :172-177 Oct, 2021
Subject: Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Fields, Waves and Electromagnetics
Power, Energy and Industry Applications
Signal Processing and Analysis
Transportation
Training
Machine learning algorithms
NP-hard problem
Heuristic algorithms
Vehicle routing
Reinforcement learning
Routing
q-learning
reinforcement learning
vehicle routing problem
VRPPD
Floyd-Warshall algorithm
Language

Online Access

Full Text (IEEE)

초록

Delivery services have reached an all-time high in terms of demands due to the COVID-19 pandemic. An optimal routing plan for these different courier services is a must. The vehicle routing problem (VRP) is an NP-hard problem in logistics research and it can be solved by exact algorithms, heuristics, and machine learning through reinforcement learning. This study introduces a reoptimization alternative as opposed to trivial Q-learning retraining on a newly introduced subproblem of dynamic VRP and VRPPD, the dynamic vehicle routing problem with pickup, delivery, and cancellation (DVRPPDC). The reoptimization technique is called Floyd-Warshall LookUp Reoptimization of Rewards Yearned (FLURRY). Combined with a one-time Q-learning computation beforehand, the Q matrix produced is updated at the cell level by a lookup table containing all the shortest paths among the pairs of parcel lockers. The lookup table is generated via Floyd-Warshall algorithm, a well-known shortest path algorithm. Upon testing on a new dataset for DVRPPDC, one-time Q-learning combined with FLURRY is 6.1x to 10.6x faster to compute than Q-learning retraining. Furthermore, an additional study done after the main series of experiments reveal that the methodology does not necessitate any Q-learning training at the first time step at all. FLURRY can be done on a Q matrix of zeroes and achieve the same path output and traversal time as Q-learning with FLURRY. The standalone FLURRY algorithm further speeds up the computation, compared to the naive Q-learning approach, from 6.1x – 10.6x to 26.5x – 117.5x.

공지

DAU Library

eArticles

요약정보

A Floyd-Warshall-Based Reoptimization of Q Matrix on the Single DVRPPD with On-Demand Cancellations

Online Access

초록