Aerospace, Vol. 12, Pages 729: Dynamic Resource Target Assignment Problem for Laser Systems’ Defense Against Malicious UAV Swarms Based on MADDPG-IA


Aerospace, Vol. 12, Pages 729: Dynamic Resource Target Assignment Problem for Laser Systems’ Defense Against Malicious UAV Swarms Based on MADDPG-IA

Aerospace doi: 10.3390/aerospace12080729

Authors:
Wei Liu
Lin Zhang
Wenfeng Wang
Haobai Fang
Jingyi Zhang
Bo Zhang

The widespread adoption of Unmanned Aerial Vehicles (UAVs) in civilian domains, such as airport security and critical infrastructure protection, has introduced significant safety risks that necessitate effective countermeasures. High-Energy Laser Systems (HELSs) offer a promising defensive solution; however, when confronting large-scale malicious UAV swarms, the Dynamic Resource Target Assignment (DRTA) problem becomes critical. To address the challenges of complex combinatorial optimization problems, a method combining precise physical models with multi-agent reinforcement learning (MARL) is proposed. Firstly, an environment-dependent HELS damage model was developed. This model integrates atmospheric transmission effects and thermal effects to precisely quantify the required irradiation time to achieve the desired damage effect on a target. This forms the foundation of the HELS–UAV–DRTA model, which employs a two-stage dynamic assignment structure designed to maximize the target priority and defense benefit. An innovative MADDPG-IA (I: intrinsic reward, and A: attention mechanism) algorithm is proposed to meet the MARL challenges in the HELS–UAV–DRTA problem: an attention mechanism compresses variable-length target states into fixed-size encodings, while a Random Network Distillation (RND)-based intrinsic reward module delivers dense rewards that alleviate the extreme reward sparsity. Large-scale scenario simulations (100 independent runs per scenario) involving 50 UAVs and 5 HELS across diverse environments demonstrate the method’s superiority, achieving mean damage rates of 99.65% ± 0.32% vs. 72.64% ± 3.21% (rural), 79.37% ± 2.15% vs. 51.29% ± 4.87% (desert), and 91.25% ± 1.78% vs. 67.38% ± 3.95% (coastal). The method autonomously evolved effective strategies such as delaying decision-making to await the optimal timing and cross-region coordination. The ablation and comparison experiments further confirm MADDPG-IA’s superior convergence, stability, and exploration capabilities. This work bridges the gap between complex mathematical and physical mechanisms and real-time collaborative decision optimization. It provides an innovative theoretical and methodological basis for public-security applications.



Source link

Wei Liu www.mdpi.com