4.2. Experimental Results
To thoroughly and comprehensively assess model performance, a variety of aggregation strategies were employed for systematic comparative analysis. These strategies included the classic FedAvg aggregation algorithm (basic averaging to construct a global model), the FedAvg-Data aggregation algorithm (which incorporates the size of each client’s dataset during aggregation and performs weighted aggregation based on dataset scale), and the FedAvg-ContribData aggregation algorithm proposed in this paper.
The evaluation process focused on several core performance indicators: accuracy, precision, recall, and F1 score. Accuracy was used to measure the overall predictive capability of the model. Precision assessed the accuracy level of the model in predicting positive classes. Recall highlighted the model’s ability to recognize positive class samples. The F1 score, calculated as the harmonic mean of precision and recall, proved particularly effective in addressing class imbalance issues and provided a comprehensive assessment of the model’s performance.
- (1)
Accuracy
From the table, it can be observed that the accuracies of FedAvg, FedAvg-Data, and FedAvg-ContribData are 49.7%, 81.5%, and 89.4%, respectively. The FedAvg-ContribData algorithm demonstrated markedly enhanced predictive proficiency over its two counterparts.
In the face of imbalanced data distributions, the averaging aggregation method adopted by FedAvg has limitations, and struggles to ensure accuracy; FedAvg-Data’s dataset weighting method overlooks the data quality and variety of each client, resulting in poorer results; and FedAvg-ContribData, merging dataset weights with a contribution–contribution analysis process, enhances performance. In contrast to FedAvg and FedAvg-Data, its respective rates of accuracy enhancement stand at 39.7% and 7.9%.
- (2)
Precision
Precision percentages for FedAvg, FedAvg-Data, and FedAvg-ContribData stand at 50.7%, 82.0%, and 91.1%, in that order. The proposed algorithm significantly improves accuracy in predicting positive class samples compared to the other two algorithms.
FedAvg fails to adequately consider the characteristics of data from different clients, leading to confusion between positive and negative class samples and resulting in lower precision; FedAvg-Data assigns weights based on dataset size, which somewhat accounts for data imbalance and improves the model’s accuracy in predicting positive classes, but this approach does not emphasize the importance of the datasets; FedAvg-ContribData combines multiple consideration mechanisms, capturing positive class features more precisely and distinguishing between positive and negative class samples with greater accuracy, achieving relative improvements of 40.4% and 9.1%, compared to FedAvg and FedAvg-Data, respectively.
- (3)
Recall
The recall rates of FedAvg, FedAvg-Data, and FedAvg-ContribData are 42.5%, 75.7%, and 83.5%, respectively. It is evident that our proposed algorithm significantly improves the ability to identify positive class samples, compared to the other two algorithms.
FedAvg’s average aggregation method misses a large amount of positive class sample information, resulting in poor recall of positive class samples; FedAvg-Data only assigns weights based on dataset size, without fully considering the importance of the datasets, leading to the neglect of some crucial data, which affects the overall performance of the model; FedAvg-ContribData dynamically adjusts the weights by combining dataset size with the value of client data, more comprehensively mining the features of positive class samples in the data. Compared to FedAvg and FedAvg-Data, it improves by 41.0% and 7.8%, respectively.
- (4)
F1 Score
The F1 scores for FedAvg, FedAvg-Data, and FedAvg-ContribData are 37.4%, 74.7%, and 84.0% respectively. It is evident that the algorithm proposed in this paper significantly enhances the balanced performance between precision and recall, compared to the other two algorithms.
FedAvg performs poorly in both precision and recall, resulting in a low F1 score and poor overall performance; FedAvg-Data assigns weights based on the size or importance of the dataset, which considers data imbalance, to some extent, but still fails to fully account for the complex characteristics of client data, leading to limited improvements in precision and recall and failing to optimize the overall model performance; FedAvg-ContribData combines multiple consideration mechanisms such as contribution degree and dataset weight, achieving a better balance between precision and recall, with improvements of 46.6% and 9.3%, compared to FedAvg and FedAvg-Data, respectively.
In summary, FedAvg-ContribData demonstrates superior performance in handling complex data-distribution scenarios, making it more suitable for application scenarios with high requirements for comprehensive model performance.
Through the analysis of these four types of images (the original image and feature maps generated by three different algorithms), it is clear that there are differences in image feature extraction among the algorithms, with the one proposed in this paper having significant advantages in the richness and effectiveness of feature representation.
FedAvg-ContribData demonstrates superior performance over all data sets, thereby affirming its effectiveness in handling diverse data sets.
The algorithm in this paper outperforms FedAvg-Data in precision and additional metrics, exhibiting a significant edge in managing diverse data. The algorithm in this paper, through improved adjustment for data irregularities in the design phase, sustains strong classification efficiency in the face of less commonly occurring classes.
The FedAvg algorithm does not fully consider the difference between the data of each client, and its performance is poor when faced with the situation of large data dispersion. When the data distribution between clients shows significant differences, the FedAvg algorithm cannot effectively aggregate data, resulting in a decline in model performance.
The FedAvg-Data algorithm aggregates according to data weight, and can achieve better results in the case of heterogeneous data. By giving different weights to the data of different clients, the contribution of each client to the global model can be better balanced, so as to improve the performance of the model. However, determining the weight only through the size or importance of the data may ignore other important characteristics of the data, which leads to limited performance improvement of the model, and may not reach the ideal level in accuracy, recall and other indicators.
The algorithm in this paper combines the weight of the dataset and the contribution of the client to the overall model. Good results have been achieved in the training set and test set, and significant advantages have been shown in the verification set. Because of its careful consideration of dataset weight and client contribution, the model is more robust in the face of unknown data distribution. The algorithm in this paper not only has good fitting ability, but also shows stronger versatility and adaptability. The algorithm in this paper can transfer the knowledge learned from training data and apply it to new and independent data sets, so as to maintain high prediction performance on data that have not been seen before.
The FedAvg algorithm is relatively simple in terms of aggregation. It uses the average aggregation method, and does not pay attention to differences in data size. The FedAvg-ContribData algorithm deeply analyzes data through statistical analysis and calculation, adjusting the value weight of data, accordingly. Although the FedAvg-Data algorithm pays some attention to data size, it is not accurate or delicate in measuring value and cannot mine the value of data as deeply as FedAvg-ContribData. This results in the model being unable to fully leverage the advantages of high-contribution data during the learning process.
The FedAvg-Data algorithm aggregates data mainly based on the size of the data without in-depth mining of the internal structure of the data and the different contributions of different parts to model learning. For example, in an image classification task, when the number of images of different categories in a dataset is seriously unbalanced—for instance, some rare category images are extremely rare, but play a key role in the integrity of the overall classification system—FedAvg-Data may not reasonably evaluate the contribution of these categories of data. Consequently, it assigns inappropriate weights, which ultimately leads to low recognition accuracy for rare category images. The FedAvg-ContribData algorithm calculates the contribution degree and makes reasonable weight allocation accordingly. In cases of very high data heterogeneity, it can ensure that the model fully learns the characteristics of each category, especially those of key few categories. This effectively reduces the classification deviation caused by data imbalance, thus outperforming the FedAvg-Data algorithm in overall classification accuracy.
During training, the FedAvg algorithm cannot distinguish between the size and contribution of data, so it wastes a lot of computing resources when dealing with redundant data of low quality, full of noise, or with low relevance to the model’s learning objectives. With the help of a reasonable contribution evaluation and weight allocation mechanism, FedAvg-ContribData can quickly screen out data that significantly impact model improvement in each iteration process. This reduces the interference of invalid data, thereby promoting the model to converge to the global optimal solution more quickly. Although FedAvg-Data also filters data, its weight distribution is not accurate enough and may still be adversely affected by some low-contribution data during the optimization process, resulting in inefficient convergence. FedAvg-ContribData can quickly focus on high-value data and concentrate on learning in large-scale distributed data-processing scenarios. This greatly reduces training time and significantly improves the final performance of the model, enabling it to better adapt to rapidly changing practical application needs and complex data environments.
To sum up, the algorithm in this paper has obvious advantages in dealing with data heterogeneity; not only can it effectively use the data of each client in the training process, but it also can show good prediction ability on new data.
Source link
Jiale Han www.mdpi.com