In the experiments related to the previous research questions, the performances of the ML algorithms were evaluated with the 10-fold cross-validation technique. This technique is one of the fairest approaches used in evaluating the performances of the algorithms. Since training and testing processes are carried out for each fold, it takes some time to measure the performance of the algorithms. Since the training period of deep learning structures can be quite long, the train–test-split structure is preferred instead of the 10-fold cross-validation and the experiments are carried out. For this reason, the train–test split is used in the modeling of the DNN. In this case, the train–test split was used in the experiments to fairly evaluate the performance of ML and DNN. First, 70% of the dataset was allocated for training, while the remaining part was allocated for testing. Secondly, 80% of the dataset was allocated for training, while the remaining part was allocated for testing. When the dataset is divided into 70%-30% and 80%-20%, the random state parameter is selected as 42 to ensure that the same training and test samples are generated in each case. In this way, it is ensured that all algorithms produce results using the same training and test samples. In the experiments conducted in the performance of ML sections, the most successful algorithms were found to be XGBoost and RF. Therefore, the performances of the DNN, XGBoost and RF algorithms were evaluated in this section as seen in Figure 10. The graphs labeled with the letter a represent a training rate of 70%, while graphs labeled with the letter b belong to a training rate of 80%. As 70% of the data were used as training and 30% as testing, the test performance of the RF algorithm was 0.8272 according to the R2 metric, while it was 34.1936 according to the RMSE metric in Figure 10a. RF made good predictions when the real values of boride thickness were below 100 μm. However, it can be seen that the model could not make good predictions for boride thickness values greater than 100 μm and deviated greatly from the actual values, especially for values greater than 300 μm in Figure 10a. When 80% of the data were used as training and 20% as testing, the test performance of the RF algorithm was 0.9181 according to the R2 metric, while the RMSE was found to be 17.1515 in Figure 10b. It can be observed that the actual values of the boride thickness data were more accurately predicted than the 70%-30% splitting condition. The main reason for the decrease in the prediction performance of the model at high thicknesses could be ascribed to the lack of data in this range. In this study, only 5 of the 375 datasets had a thickness of 300 microns or more, severely limiting the ability of the model to generalize over these ranges. In particular, ML models require large and balanced datasets; a small number of high thickness samples makes the model more prone to random deviations and bias in these values. As can be seen in Figure 10, at different data split ratios (e.g., 70% training/30% testing and 80% training/20% testing), it is observed that the prediction accuracy increases as the training data ratio increases, but this accuracy cannot be sustained for high thicknesses due to the limited number of high thickness values in the current dataset. The low prediction performance of the model, especially at thicknesses of 300 microns and above, is due to the fact that the model is trained with a small number of data at these values. This indicates that the higher thickness ranges are not adequately represented by the model and therefore the model needs more data to improve its accuracy in this range. This deficiency of the dataset can be overcome with extended experimental data or data augmentation techniques to increase the generalization capacity of the model and improve the prediction performance.
When the data splitting was 70% training and 30% testing, the test performance of the XGBoost algorithm was 0.8810 according to the metric and 27.3317 according to the RMSE metric. As for 80% training/20% data splitting, the test performance of the XGBoost algorithm was 0.9299 for the metric and 15.8628 for RMSE metric. It can be seen that the XGBoost algorithm made better predictions than the RF algorithm. Furthermore, the test performance of DNN was 0.8846 according to the metric and 26.9236 according to the RMSE metric for 70% training/30% test data splitting. As for 80% training/20% data splitting, the performance of the DNN was 0.9377 according to the metric and 14.3248 according to the RMSE metric in Figure 10. It can be said that the DNN made better predictions than both the RF algorithm and the XGBoost algorithm.
Source link
Selim Demirci www.mdpi.com