To assess the enhancement of the proposed improvement algorithm on the performance of the target detection algorithm for UAV aerial images, three modification points were incorporated into the original YOLOv8n model for ablation experiments: the introduction of deformable convolution, the introduction of the GAM module, and the application of the WIOU loss function. The effects of each module on the algorithm were evaluated, with the results presented in Table 1.
In Table 1, YOLOv8n represents the original model. Configuration A involves substituting the CIOU loss function with the WIOU loss function, which enhances precision by 0.9% and [email protected] by 2.2% compared to the original model. However, R and FPS decreased by 2% and 35.3%, with the number of parameters, the FLOPs, and the model size remaining unchanged. Configuration B introduces the GAM attention mechanism to YOLOv8n, improving P by 1.5%, R by 0.5%, and [email protected] by 0.4%, without affecting the model size, while the number of parameters and the FLOPs increase by 0.01 M and 0.2 G, respectively, and FPS decreased by 18.4%. Configuration C implements deformable convolution, which boosts P, R, [email protected], and FPS by 3.4%, 4%, 3.8%, and 1.6%, respectively, maintaining the model size consistent with the original. However, the FLOPs decrease by 0.1 G, and the number of parameters increases by 0.03 M. Configuration D improves the model by introducing deformable convolution and replacing the CIOU loss function, maintaining P unchanged but enhancing R by 0.7% and [email protected] by 0.3%, with the number of parameters, the FLOPs, and the model size equivalent to those of Configuration C, but FPS decreased by 0.5%. Configuration E enhances the model by introducing deformable convolution and the GAM attention mechanism, improving P, R, and [email protected] by 1.4%, 2%, and 1.6%, respectively. In contrast, the number of parameters, the FLOPs, and the model size increased by 0.04 M, 0.1 G, and 0.1 MB, respectively, and FPS decreased by 7.3%. The model presented in this paper enhances P, R, [email protected], and FPS by 3.5%, 4.8%, 5.1%, and 12.1%, respectively, compared to YOLOv8n. The number of parameters and model size increase by only 0.04 M and 0.1 MB, respectively, while FLOPs are reduced by 0.1 G. The advantages of these improvements are incorporated into our model, which enhances detection accuracy and real-time performance without significant changes in parameter count, FLOPs, or model size, indicating the model’s efficacy in detection.
Source link
Lei Zhang www.mdpi.com