Algorithms, Vol. 19, Pages 149: A Hybrid Deep Reinforcement Learning Framework for Vehicle Path Optimization with Time Windows
Algorithms doi: 10.3390/a19020149
Authors:
Zhiguo Xiao
Changgen Li
Junli Liu
Xinyao Cao
The vehicle routing problem with time windows (VRPTW) is a core challenge in logistics optimization, requiring the minimization of transportation costs under constraints such as time windows and vehicle capacity. Deep reinforcement learning (DRL) provides an effective approach for solving such complex combinatorial optimization problems. However, existing DRL methods still suffer from shortcomings, including insufficient modeling of spatiotemporal correlations among customer nodes, inadequate capture of path temporal dependencies, and policy exploration prone to local optima. To address these issues, this paper proposes an end-to-end hybrid DRL framework: the encoder employs a graph attention network (GATv2) with adaptive gating to effectively model the coupling between customer spatial proximity and time window constraints; the decoder integrates multi-head attention (MHA) and a dynamic context-aware long short-term memory network (LSTM) to synergistically enhance the overall quality and constraint feasibility of route solutions; during the training phase, an improved proximal policy optimization (PPO) algorithm and a constraint-aware composite reward function are used to enhance optimization stability. Experiments on random instances, Solomon benchmark datasets, and real-world logistics datasets show that, compared to mainstream DRL methods and classical heuristic algorithms, the proposed framework reduces transportation costs by 2–10%, achieves a demand fulfillment rate exceeding 99%, and exhibits a performance degradation of only 3.2% in cross-distribution testing. This study provides an integrated DRL solution paradigm for combinatorial optimization problems with complex constraints, promoting the application of DRL in the field of intelligent logistics.
Source link
Zhiguo Xiao www.mdpi.com



