YOLOv8
YOLOv8 is built upon the success of previous YOLO versions and has a wide user base and strong community support [22]. This provides us with abundant resources and tools for model training, testing, and deployment. It also facilitates the implementation of our improved algorithms and the comparison of results with other state-of-the-art methods. YOLOv8 has demonstrated superior performance in terms of both speed and accuracy, which is crucial for detection tasks. It has achieved a better balance between these two aspects compared to many other architectures, making it the ideal choice for our study. Consequently, considering the requirements, complexity of detection tasks, and availability of resources, this paper chooses the YOLOv8 network model.
YOLOv8 adopts a high-efficiency feature extraction network, introducing a lightweight design concept and depth-wise separable convolution, which helps to reduce model parameters and computational complexity [23]. At the same time, it also retains a powerful capability of representation and faster inference speed under high-accuracy detection, satisfying the application scenario. In YOLOv8, multi-scale prediction and Feature Pyramid Networks (FPNs) have been utilized to precisely capture and recognize objects regardless of their size [24]. Additionally, it implements an advanced Non-Maximum Suppression (NMS) algorithm to optimize detection results, further reducing false and missed detection rates [25]. In the training process, apart from the aforementioned techniques, YOLOv8 incorporates various advanced methods to enhance the stability and convergence speed, such as automatic learning rate adjustment [26] and weight decay optimization [27]. The combination of these strategies allows YOLOv8 to rapidly obtain high-quality model parameters in limited datasets, significantly promoting follow-up practical applications. The basic principle of YOLOv8 is illustrated in Figure 4.
In this paper, we employed Precision, Recall, score, and Average Precision (AP) [28] as the evaluation metrics to examine the proposed network.
Precision indicates the fraction of true positive samples among all samples predicted as positive by the model. It reflects the accuracy of the model’s prediction, namely, the reliability of the model’s positive prediction. Higher Precision represents a more correct identification and a lower false detection rate. The calculation formula is shown in (1):
where TP is true positive, which refers to the samples correctly identified as Gray Leaf Spot on apple; FP is false positive, denoting the false identification of Gray Leaf Spot; TN means true negative, which refers to the samples correctly identified as background; and FN means false negative, representing the incorrect identification of samples as background. In this study, Precision represents the proportion of Gray Leaf Spot cases detected by the model that are actually Gray Leaf Spot. A high Precision value indicates that the model’s detection results are highly reliable.
Recall represents the proportion of actual positive samples that are correctly predicted as positive. It reflects the model’s ability to identify positive instances [29]. A higher Recall value indicates a lower probability of missing actual positive samples, meaning the model can detect the majority of Gray Leaf Spot cases. The calculation formula is shown in (2):
The score is the harmonic mean of Precision and Recall, providing a comprehensive measure of the model’s performance. It balances the importance of Precision and Recall, offering a single metric to evaluate the model. A higher F1 score indicates better overall performance and robustness. The score helps us find an optimal balance between Precision and Recall, ensuring the model’s effectiveness in practical applications. The calculation formula is shown in (3):
AP represents the average accuracy at different Recall levels that equals the area under the Precision–Recall curve (PR curve). It is a comprehensive metric for measuring model performance across different categories. It takes into account the changes in Precision at different Recall rates, thus providing a more holistic reflection of the model’s detection capabilities and ensuring its robustness across different confidence levels. The ‘r’ indicates the integral variable and determines the integral of Precision * Recall between 0 and 1. The calculation formula is shown in (4):
is the average value of AP calculated at multiple Intersection over Union (IoU) thresholds (ranging from 0.5 to 0.95, with a step size of 0.05). It thoroughly evaluates the model’s performance under varying detection challenges and provides a more comprehensive reflection of the model’s actual behavior compared to AP. Above all, it requires the model to maintain high accuracy under different IoU thresholds, which is particularly important for detection tasks in practical applications. The calculation formula is shown in (5):
Source link
Siyi Zhou www.mdpi.com