Collaborative Control of UAV Swarms for Target Capture Based on Intelligent Control Theory


1. Introduction

In recent years, swarms of unmanned aerial vehicles (UAVs) have been increasingly deployed in battlefields and disaster relief scenarios to provide distributed and collaborative services such as search, tracking, and capture operations. However, as the complexity and diversity of missions increase, along with the growing number of UAVs in swarms and the challenges of complex spatial environments, significant obstacles arise in collaborative missions [1,2,3]. In light of these challenges, this paper addresses the coordination of dynamic target capture in multi-UAV systems using intelligent control theory, aiming to enhance system performance and availability across various scenarios [4,5,6].
Extensive research has been conducted in the areas of algorithms, control strategies, and experimental verification for multi-UAV collaborative target capture. These efforts have strongly supported the application and promotion of this technology [7,8]. To address the high maneuverability of UAVs and the complexity of the environment, UAV dynamics are considered, and a model predictive control method that integrates a multi-UAV planner with cooperative target capturing is proposed [9]. Additionally, research on efficient path planning and dynamic target prediction models has provided valuable insights into improving UAV swarm coordination [10]. Recent studies also emphasize the importance of real-time planning and control in swarm missions to enhance task completion efficiency.
A distributed self-organizing approach for UAV swarm search-attack mission planning is presented [11]. This method enables flexible motion planning and coordinated group tasks. Li et al. improved the deep deterministic policy gradient algorithm, enabling UAVs to learn intelligent decision-making [12]. To ensure the capture of an evader, a decentralized real-time algorithm for multiple pursuers to collaboratively track a single evader in a bounded and simply connected planar domain is proposed [13,14]. Furthermore, advancements in multi-objective optimization algorithms have enhanced the ability of UAVs to adapt to complex and unpredictable scenarios [15], which highlights advancements in collaborative tracking algorithms.
In addition to optimization strategies, innovative methods such as Gaussian distribution-based speed regulation and position feedback formation control have been applied to improve target capture precision and efficiency [16,17]. Moreover, cooperative roundup control strategies aim to minimize disparities in travel time and distance among agents, enhancing system fairness and effectiveness. Diffusion Kalman gain and limit loop phase updates further refine real-time decision-making, ensuring the system’s ability to adjust dynamically to changes in target behavior [18]. In addition, recent studies have utilized advanced mathematical models such as differential game theory to optimize dynamic decisions in pursuit–evasion tasks, further extending the capabilities of UAV systems [19] and emphasizing the potential of differential game theory for dynamic decision-making.
Recent research has also demonstrated the effectiveness of high-performance control algorithms combined with dynamic planning methods in improving the efficiency and reliability of UAV swarm missions. For example, a study explored predictive control methods for dynamic environment path planning, which significantly reduced target capture time while improving system robustness [20,21]. The core innovation lies in integrating real-time feedback control with environmental adaptability, thereby optimizing the mission execution capabilities of UAV swarms. This method also incorporates cooperative path planning and strategy adjustments, enabling UAVs to dynamically adapt to complex environments and avoid obstacles and high-risk areas. Such approaches show promising advancements in addressing dynamic task allocation and real-time responsiveness, providing new insights for future UAV mission planning.
Differential games originated in the 1950s as a reliable and efficient method for solving air combat strategies. Due to the emergence of guided interceptor munitions and the need for maneuver pursuit problems in aerospace, the theoretical pursuit problem, where both adversaries are free to decide their actions, was investigated by American mathematician Dr. Rufus Isaacs and others, funded by the Air Force through the Rand Corporation in the United States. In Isaac’s seminal paper [22], principles of game theory, calculus of variations, and control theory were applied to solve problems involving dynamic conflicts between two or more intelligences. The approach to differential dynamic programming used in the paper allowed differential games to break free from the discrete time constraints of traditional games, enabling the solution of real-time dynamic optimal equilibrium strategies. Although the differential games are applicable to the decision-making of UAV attack and defense air combat, the explosive growth in computational complexity has limited the number of players in current research to single digits. This limitation makes it far from applicable to swarm-sized problems and unsuitable for solving maneuver strategies in UAV clustering attack and defense scenarios. Compared to distributed control methods, differential game theory offers significant advantages in UAV swarm coordination. It can analyze adversarial strategies in real-time, ensuring global optimal decisions, and provides strong adaptability and predictive capabilities, especially for pursuit–evasion problems in complex dynamic environments. Furthermore, differential games offer greater robustness, effectively handling individual UAV failures or environmental changes.
The pursuit–evasion game is a significant form of offensive and defensive confrontation. The study of pursuit–evasion games can enable UAVs to leverage their performance in air combat and enhance the autonomy of decision-making and control. In pursuit–evasion differential games, the pursuer’s task is to capture (or destroy) the evader, while the evader must evade the attack through maneuvers. In recent years, extensive research has been conducted by scholars worldwide on pursuit–evasion differential games, including studies on game scale expansion. In 2017, Professor Tomlin and his team researched a model involving N attackers and N defenders around a fixed target area. The maximum matching method from graph theory was employed to decompose the many-to-many game into one-to-one games, allowing attackers to reach as many target locations as possible while defenders aimed to capture as many attackers as possible. In 2018, Zongying Shi and colleagues from Tsinghua University studied a pursuit game involving two pursuers and one evader in a rectangular region [23] and provided the optimal evade strategy when the evader is in a dominant region. In the following year, the team further investigated the bounded-region pursuit game between a team of pursuers and a team of evaders [24], where the pursuers intercepted the evader using a task allocation method. By integrating advanced algorithms, dynamic models, and practical applications, UAV swarms are increasingly capable of addressing the challenges posed by real-time coordination and pursuit–evasion tasks. However, further research is still needed to improve efficiency and timeliness, especially in complex and dynamic environments.

The main contributions of this paper are as follows:

(1)

Incorporating differential game theory, the target positions of both pursuers and evaders are dynamically updated in real-time. By enabling evaders to employ strategic support, the optimal path selection for both parties during pursuit and evasion is maintained. Differential game theory allows the prediction of the future trajectories of target UAVs and dynamically adjusts the paths by solving differential game equations. The objective function of the differential game emphasizes the adversarial strategies between the pursuers (capture UAVs) and evaders (target UAVs), explicitly considering the dynamic behaviors and escape strategies of the targets. This approach enhances the precision and effectiveness of path planning.

(2)

Employing a dynamic closed circular pipeline control algorithm, the target points calculated by the differential game are utilized as the center points for the next pipeline generation, thereby improving the success rate of the capture. The proposed dynamic closed circular pipeline algorithm not only optimizes the capturing performance of the UAVs but also prevents them from converging into a single line. Instead, the UAVs are distributed within a spatial region, which enhances the system’s robustness and efficiency in capturing. This approach leverages the establishment of the circular closed pipeline to ensure better coordination among UAVs, significantly improving the overall capture performance.

(3)

The effectiveness and feasibility of the algorithm are verified through simulation and real flight experiments based on the RflySim platform.

3. Differential Game for Solving the Optimal Pursuit Strategy

3.1. Construction of the Pursuit Model in a Differential Game

A complete differential game model should include control vectors, state vectors, and the payoff function. Based on the fundamental model of multi-player non-cooperative differential games, modeling and analysis are conducted for the specific case of the multi-UAV pursuit problem. The state vector is used to describe the flight status of the UAV, including the x , y , z positions, as well as the flight velocity v . In the context of quantitative differential countermeasures, each UAV at time t is assumed to have complete knowledge of all control strategies adopted by the opponent up to that point. The basic differential game model for the multi-UAV pursuit problem is established by formulating the state equations for each UAV currently involved in the pursuit mission. The countermeasure system equations, which describe the state activities of both parties, are analyzed in conjunction with the payoff function to solve for the equilibrium point and its corresponding optimal strategy. The mathematical model for the quantitative differential countermeasures is formulated as follows:

x Ë™ = f ( t , x , u p , u e ) x t 0 = x 0

where t is the time when the pursuit mission is carried out, x ( t ) represents the current state of the pursuing UAV, and u p ( t ) and u e ( t ) are the control vectors of the pursuit UAV and evader UAV at time t .

The following assumptions are established regarding the differential game model:

(1)

The game is classified as a complete information game, meaning that both parties are aware of the necessary information regarding each other’s relative states, with no limitations on observing communication or other conditions.

(2)

The system state is assumed to be accurate, disregarding the effects of sensor errors and delays on information accuracy during operation.

(3)

The maximum acceleration (in absolute value) of the UAV is constrained. The acceleration range is softly constrained using weights in the evaluation function, and the change in acceleration varies gently near the boundary.

According to the special characteristics of the pursuit–evasion system, both time optimality and energy consumption must be considered. As such, a performance index based on the linear-quadratic type is selected, consisting of a static terminal performance index and a dynamic performance index in the integral term. Based on the above assumptions, the payment function is constructed for an arbitrary UAV:

J = 1 2 d i T S 1 d i + 2 d i T S 2 u p i + u p i T S 3 u p i t = t f + 1 2 ∫ t 0 t f a p i   T R p a p i − a e i   T R e a ei d t

where ( S 1 , S 2 , S 3 ) represents the weight matrix of the final state; ( R p , R e ) denotes the acceleration weight matrix; d i T S 1 d i is the weighted square of the relative distance between the pursuit UAV and evader UAV; the optimization objectives of these terms are to minimize the relative distance and relative velocity, ensuring that the pursuer approaches the evader more effectively, thereby enhancing the tracking accuracy of the UAV. d i T S 2 u p i is the weighted projection of the relative velocity over the relative distance; u p i T S 3 u p i is the weighted square of the relative velocity; a p i   T R p a p i represents the weighted square of pursuit UAV’s acceleration; a e i   T R e a ei is the weighted square of the evader UAV’s acceleration; the square of acceleration is closely related to energy consumption, and this part aims to constrain acceleration to reduce energy consumption. t f represents the end time of the game. The design of the payoff function balances static terminal performance and dynamic performance. The specific values of the weight matrices ( S 1 , S 2 , S 3 ) and ( R p , R e ) determine the performance optimization priorities. For instance, a higher weight on S 1 prioritizes the optimization of tracking accuracy, while higher weights on ( R p , R e ) emphasize energy efficiency.

The terminal constraints are set to satisfy the solution of the pursuit–evasion system, based on the system’s final state of motion.
In the differential game problem, similar to countermeasure theory, the optimal solution for both strategies must be determined, ensuring the differential game system transitions from the known initial state t 0 to the terminal constraints, while simultaneously maximizing and minimizing the payoff generalization. Ultimately, both parties in the pursuit must find a Nash equilibrium solution that satisfies the conditions of the game.

J u p * , u e ≤ J u p * , u e * ≤ J u p , u e *

where u p * and u e * represent the Nash equilibrium solutions of the pursuit UAV and evader UAV, respectively. When the pursuit–evasion system adopts this combination of equilibrium solutions as the pursuit strategy, the system achieves a dynamic equilibrium state. In the differential game-based pursuit–evasion model, it is assumed that both the pursuit UAV and the evader UAV have independent decision-making systems and are unaware of each other’s current strategy choices. When the pursuit UAV selects a control input u p * , regardless of the control input u e chosen by the evader, the pursuit UAV’s payoff is always greater than or equal to that achieved by any other strategy. Similarly, when the evader UAV selects the control quantity u e * , its payoff is always greater than or equal to that achieved by any other strategy, regardless of the pursuer’s choice of u p . At this point, the system is considered to have reached the saddle point u p * , u e * of the differential game, and J u p * , u e * is the value of the differential game. If the pursuit VAV fails to capture the evader UAV within the specified time frame, the pursuit mission is deemed a failure. The real-time dynamic target point distribution diagram of the pursuit UAVs and evader UAV based on the differential game algorithm is shown in Figure 4.

3.2. Differential Equation Solving

The established differential response is solved by constructing the Hamiltonian function, which is as follows:

H [ x ( t ) , u p ( t ) , u e ( t ) , λ ( t ) , t ] = L [ x ( t ) , u p ( t ) , u e ( t ) , t ] + λ T ( t ) f [ x ( t ) , u p ( t ) , u e ( t ) , t ]

Based on the maximum principle of differential games, the payoff function J and the Hamiltonian function H simultaneously reach their extreme values. The extreme point of the payoff function is J , identified through the extreme value of the Hamiltonian function H , resulting in the determination of the optimal flight control for the pursuit UAV. The Hamiltonian function H reaches a maximum value for the pursuit UAV and a minimum value for the target UAV.

In general, obtaining the analytical solution of the differential countermeasure model is challenging, if not entirely impossible. The numerical solution is a well-established technique, and in this paper, the fourth-order Runge–Kutta algorithm is employed to numerically solve the model.

Step1: The initial control vectors u p k ( t ) , u e k ( t ) , along with the initial state information x t 0 , are given in the domain of definition.

Step2: The fourth-order Runge–Kutta algorithm is employed to perform forward integration of the state from t 0 to t f , with the initial state information yielding x k ( t ) .

Step3: λ k ( t ) is obtained by performing backward integration from t f to t 0 using the fourth-order Runge–Kutta algorithm, combined with the terminal conditions.

Step4: The gradient values of the real-time control vectors u p , u e are calculated.

∇ J p ( u p , u e ) = ∂ H ( t , x , u p , u e , λ ) ∂ u p

∇ J e ( u p , u e ) = ∂ H ( t , x , u p , u e , λ ) ∂ u e

Step5: The dominance value changes maximally along the gradient direction; thus, starting from ( u p k , u e k ) , the new control vectors u p , u e are solved along the gradient direction.

u p k + 1 ( t ) = u p k + α k J p u p k , u e k u e k + 1 ( t ) = u e k + β k J e u p k , u e k

where α k , β k represent the iteration steps.

Step6: The value of ε 0 is reduced below a minimal threshold. Then, the control vectors u p k + 1 ( t ) , u e k + 1 ( t ) are checked to determine whether they satisfy the following conditions. If they do, these vectors are output; otherwise, the process returns to Step 1, and the control vectors u p k + 1 ( t ) , u e k + 1 ( t ) are used as the initial control inputs for the next iteration.

∇ p J u p k ( t ) , u e k ( t ) < ε 0

∇ e J u p k ( t ) , u e k ( t ) < ε 0

4. Real-Time Dynamic Capture of UAVs Based on Closed Circular Pipeline

In Section 3, a long-distance pursuit–evasion model based on differential game theory is proposed to derive optimal pursuit–evasion strategies for multi-UAV systems in dynamic and uncertain environments. For close-range capture, a closed circular pipeline model is introduced to improve the success rate of UAV capture.

4.1. Capture UAVs Enter the Closed Circular Pipeline

A closed circular pipeline is established to fulfill the requirements of UAV capture missions. Traditional methods typically design a single curve in 3D space to achieve UAV capture. However, the capture mission may require a pipeline rather than a single curve. This configuration allows more UAVs to be accommodated within a single pipeline, thereby better addressing various environmental constraints. Firstly, the predicted position of the target UAV, calculated using the differential game algorithm from the previous section, is employed as the center point for closed circular pipeline construction. Next, the capture UAVs are converged toward the centerline of the closed circular pipeline. During this stage, a shortest-distance-based allocation algorithm is applied to assign capture points to capture UAVs, directing them to approach the target UAV and position themselves around the target UAV in a coordinated manner.

K points are uniformly distributed along the centerline of the closed circular pipeline as capture points; the location of the capture point is shown in Figure 5. A greedy algorithm is employed to select the closest capture point for pre-assignment, followed by the re-assignment of any duplicated capture points based on the matching rule. This method is computationally simple and highly efficient. On the one hand, it ensures that each UAV is evenly distributed around the target UAV, and on the other hand, it minimizes the time difference between each capture UAV’s arrival at the capture point. The flow of the greedy algorithm is shown in Algorithm 1:

Algorithm 1 Greedy Algorithm
The number of capture UAVs in the environment N, the number of capture points k, and the calculation of the position of each capture point Q j .
for i = 1 to N do
 for j = 1 to k do
 Calculate the distance D ij from the i-th capture UAV to each capture point Q j
 Assign the minimum distance D ij corresponding to the capture point Q j to the i-th capture UAV, Q j = Q j + 1
  If  Q j = 0  then
   The capture point is not assigned
  If  Q j = 1  then
   Upon successful allocation, the pairing of the capture UAV and the capture point is removed from the loop.
  Else  Q j > 1
   The capture point is assigned to the capture UAV furthest away, and the paired UAV and capture point are subsequently removed from the loop
 end for
end for

4.2. Controller Design of the Capture UAVs After Entering the Closed Circular Pipeline

When the capture UAVs enter the closed circular pipeline based on the greedy algorithm, the velocity command for the i-th capture UAV is formulated as follows:

v i   = − sat k 1 u 1 , i + k 2 u 2 , i + k 3 u 3 , i + k 4 u 4 , i , v max , i

where k 1 , k 2 , k 3 , k 4 > 0 , v max , i > 0, represents the maximum allowable speed of the i-th UAV, and the saturation function is formulated to ensure that v does not exceed v max , i .

sat v i , v max , i ≜ v i v i ≤ v max , i v max , i v i ‖ v i ‖ v i > v max , i

u 1 , i , u 2 , i , u 3 , i , u 4 , i are subcommands to achieve different purposes. They are shown as follows:

The UAV is directed by u 1 , i along the positive direction of the closed circular pipeline, ensuring that the capture UAV continuously moves around the target UAV to form a dynamic enclosure.

u 1 , i = sat N x i v e , v e

where v e is the desired speed of the UAV capture.

u 2 , i is used to enable inter-UAV obstacle avoidance within the closed circular pipeline, preventing potential collision conflicts among UAVs.

u 2 , i = ∑ j ∈ N m , i − σ m x i − x j x i − x j x i − x j

where N m , i is the neighborhood of the i-th UAV, which is the set of other UAVs that could potentially collide with it. r s ( r a > r s ) is defined as the UAV safety radius, and collision prevention begins when the distance between UAVs exceeds r s + r a . The function value reaches its maximum when the UAVs distance equals 2 r s . The objective of the speed command u 2 , i is to minimize its speed value, ideally approaching zero, which implies that the i-th UAV will not collide with the j-th UAV.

σ m ( x ) ≜ σ x , 2 r s , r s + r a , the smooth function, is defined as follows:

σ x , d 1 , d 2 = 1 x ≤ d 1 A x 3 + B x 2 + C x + D d 1 ≤ x ≤ d 2 0 d 2 ≤ x

where A = − 2 / d 1 − d 2 3 , B = 3 d 1 + d 2 / d 1 − d 2 3 , C = − 6 d 1 d 2 / d 1 − d 2 3 , and D = d 2 2 3 d 1 − d 2 / d 1 − d 2 3 . This design is suitable for avoiding drastic changes in speed or acceleration near the critical distance, ensuring smoother and more stable flight trajectories for the UAV.

u 3 , i is designed to keep the UAV within the closed circular pipeline, ensuring it does not exceed the pipeline’s boundary.

u 3 , i = − I 2 − N x i T N x i x i − x e x i − x e T T σ t d t , i

where σ t ( x ) ≜ σ x , r s , r a , and x e is the center point for the closed circular pipeline. d t , i represents the distance between the UAV and the closed circular pipe wall. No force is applied when the distance exceeds r a ; the force begins to act when the distance is less than r a , reaching its maximum when the distance is exactly r s . As the pipe width remains constant, the velocity command is designed to minimize the obstacle avoidance function, ideally reducing it to zero. This ensures that the i-th UAV and its safety zone stay within the closed circular pipe.

u 4 , i controls the uniform capture of the target UAV by the capture UAVs, ensuring an appropriate distance is maintained between each capture UAV.

u 4 , i = ∑ j ∈ N e , i σ e l i j N x i , N x i ⋅ ( x i − x j ) > 0 ∑ j ∈ N e , i − σ e l i j N x i , N x i ⋅ ( x i − x j ) ≤ 0

where σ e ( x ) ≜ σ x , 2 l e , 2 α l e , α > 1 , and l e = L 2 N . To achieve an approximately uniform capture, the circumference of the centerline of the closed circular pipeline is calculated as L = 2 π r l . The relative positions of the two UAVs are represented by the length of the corresponding curve inside the pipeline l ij = L θ 2 π ; θ < π is the angle between the lines connecting two adjacent capture UAVs and the target UAV. When the distance l ij is greater than 2 α l e , no force is applied; when the distance is less than 2 α l e , force starts to be applied; and when the distance equals 2 l e , the force is maximized. The velocity command u 4 , i is designed to approach zero or be as small as possible to ensure l i j = 2 l e . Uniform target capture is achieved when all UAVs satisfy these conditions.

The designed controller integrates multiple subcommands to achieve several key objectives: it forms a dynamic and stable distribution around the target UAV, prevents inter-UAV collisions through robust obstacle avoidance mechanisms, and maintains confinement within the pipeline boundary to ensure structural stability. Additionally, it ensures a uniform distribution of UAVs for optimal capture efficiency and guarantees smooth and stable UAV trajectories, thereby enhancing both safety and reliability.

4.3. Stability Analysis

To ensure the stability of the controller, we adopt a mathematical approach based on Lyapunov’s stability theory. A Lyapunov function is defined as follows:

V = V 1 + V 2 + V 3 + V 4

where V 1 = 1 2 ∑ i = 1 N v i − v e 2 , V 2 = 1 2 ∑ i , j ∈ N i σ m x i − x j , V 3 = 1 2 ∑ i = 1 N σ t d t , i , and V 4 = 1 2 ∑ i = 1 N ∑ j = 1 , j ≠ i N σ e l i j . By using the velocity command (16) for all UAVs, V • satisfies V • ≤ 0 , and the system is stable.

6. Conclusions

In this paper, a real-time dynamic multi-UAV capture strategy based on the differential game model is proposed to address the challenge of capturing target UAVs. The optimal strategy for attaining the Nash equilibrium through differential games enables the capture UAVs to effectively track the target UAV. Additionally, this paper introduces a collaborative multi-UAV target capture approach using closed circular pipeline control. This method ensures that the capture UAVs are evenly distributed around the target UAV, which facilitates the smooth capture process. By constraining the capture UAVs within the closed circular pipeline, clustering and overcrowding are prevented, thereby enhancing the efficiency and effectiveness of the capture operation. Research and experimentation demonstrate the effectiveness of the proposed strategies and methods in successfully capturing the target UAV. The combination of the differential game model and closed circular pipeline control facilitates the coordination and distribution of capturing UAVs, thereby enhancing the success rate of UAV formation capture.

In future research, efforts will focus on further extending and optimizing the existing multi-UAV real-time dynamic capture and collaborative methods. By accounting for adversarial occupancy uncertainty and resource availability, the applicability and robustness of the method can be enhanced, ensuring its effectiveness in dynamic environments.

UAV maneuver decision-making algorithms will be implemented to enhance UAV efficiency during operation, particularly for path planning in unknown and dynamic environments. These future efforts are expected to further enhance the practical applicability of multi-UAV cooperative strategies, providing robust theoretical and technical support for UAV technology in complex missions.



Source link

Yuan Chi www.mdpi.com