Delivery services have reached an all-time high in terms of demands due to the COVID-19 pandemic. An optimal routing plan for these different courier services is a must. The vehicle routing problem (VRP) is an NP-hard problem in logistics research and it can be solved by exact algorithms, heuristics, and machine learning through reinforcement learning. This study introduces a reoptimization alternative as opposed to trivial Q-learning retraining on a newly introduced subproblem of dynamic VRP and VRPPD, the dynamic vehicle routing problem with pickup, delivery, and cancellation (DVRPPDC). The reoptimization technique is called Floyd-Warshall LookUp Reoptimization of Rewards Yearned (FLURRY). Combined with a one-time Q-learning computation beforehand, the Q matrix produced is updated at the cell level by a lookup table containing all the shortest paths among the pairs of parcel lockers. The lookup table is generated via Floyd-Warshall algorithm, a well-known shortest path algorithm. Upon testing on a new dataset for DVRPPDC, one-time Q-learning combined with FLURRY is 6.1x to 10.6x faster to compute than Q-learning retraining. Furthermore, an additional study done after the main series of experiments reveal that the methodology does not necessitate any Q-learning training at the first time step at all. FLURRY can be done on a Q matrix of zeroes and achieve the same path output and traversal time as Q-learning with FLURRY. The standalone FLURRY algorithm further speeds up the computation, compared to the naive Q-learning approach, from 6.1x – 10.6x to 26.5x – 117.5x.