2.3. Statistics and Machine Learning Toolbox Application
At the same time, MATLAB is recognized as a high-performance programming language and interactive environment that is widely employed for numerical calculations, data analysis, algorithm development, and data visualization. The MATLAB R2024a version offers users an extensive range of machine learning models that are applicable for data analysis and predictive modeling. There are several benefits of simulating pan evaporation patterns using MATLAB in combination with machine learning (ML) and deep learning (DL) approaches, especially in environmental research. With a wealth of built-in functions and visualization capabilities, MATLAB offers an interactive, high-level environment that makes it easier to develop intricate algorithms and analyze big datasets. When ML and DL are combined, non-linear correlations between evaporation rates and meteorological variables can be modelled, improving forecast accuracy over conventional techniques. These cutting-edge methods are flexible, scalable for handling large amounts of data across time, and able to pinpoint important variables. When combined, MATLAB and these AI techniques improve water resource management’s real-time monitoring and decision making, resulting in more environmentally friendly responses to climatic unpredictability.
These models used in the study are as follows:
1. (LR): LR is a fundamental statistical technique employed to model the relationship between a dependent variable and one or more independent variables using a linear equation. This model forecasts outcomes based on a linear fit.
2. Interaction LR: This variant of linear regression incorporates interaction terms among variables. It evaluates not only the direct effects of predictor variables but also examines how the influence of one predictor variable varies with the level of another predictor.
3. Robust LR: This approach modifies traditional linear regression to minimize the influence of outliers within the dataset. Techniques such as Huber loss can be applied to enhance the model’s resistance to extreme values.
4. Stepwise LR: Stepwise linear regression represents an automated technique for selecting a subset of predictors by methodically adding or removing variables based on established criteria, such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). The term “stepwise” denotes the iterative process of refining predictors to determine the most suitable model complexity.
5. Fine Tree: A fine tree is a type of decision tree characterized by its deep branching structure, which may lead to overfitting. While it excels at capturing intricate details within the dataset, its ability to generalize new data is often limited.
6. Medium Tree: The medium tree represents a decision tree that effectively balances depth and breadth. It demonstrates improved resistance to overfitting in comparison to a fine tree while still being capable of identifying significant patterns within the data.
7. Coarse Tree: A coarse tree is a simplified decision tree model featuring fewer splits and branches. This design enhances its generalizability and reduces the likelihood of overfitting when compared to both fine and medium trees.
8. Linear SVM: The linear support vector machine is designed to identify a linear hyperplane that effectively separates distinct classes within the feature space. This approach is particularly suitable for datasets that exhibit linear separability.
9. Quadratic SVM: This variant of support vector machine employs a quadratic kernel function to establish a decision boundary capable of addressing situations where the classes are not linearly separable in the original feature space.
10. Cubic SVM: The cubic support vector machine, akin to the quadratic variant, utilizes a cubic kernel function, thereby facilitating the formation of more intricate decision boundaries.
11. Fine Gaussian SVM: This model leverages a Gaussian radial basis function (RBF) kernel to manage high-dimensional data. The term “fine” denotes that the model has been meticulously calibrated to capture complex interrelationships within the data.
12. Medium Gaussian SVM: The medium Gaussian support vector machine achieves a balance between detail and generalization, effectively mitigating the risks of overfitting and underfitting.
13. Coarse Gaussian SVM: This model features a Gaussian support vector machine with a broader decision boundary, which reduces the likelihood of overfitting; however, it may also fail to detect subtle patterns present within the data.
14. EL Least Squares: This methodology represents an advanced optimization of linear regression, specifically designed to manage larger datasets with greater effectiveness by employing sophisticated numerical techniques for least squares fitting.
15. EL SVM: This term denotes an enhanced implementation of support vector machines (SVM) that emphasizes rapid convergence and computational efficiency, thereby rendering it well-suited for the analysis of large datasets.
16. Ensemble: Boosted Trees: An ensemble method that combines weak learners (typically, shallow trees) in a sequential manner, where each tree corrects the errors of the previous ones. This often leads to a powerful predictive model.
17. Ensemble: Bagged Trees: Utilizes bootstrap aggregating (bagging) to build multiple decision trees from random samples of the data. The final prediction is usually made by averaging or majority voting from all trees, enhancing robustness and reducing overfitting.
18. Squared Exponential Gaussian Process Regression: This approach to Gaussian process regression utilizes a squared exponential kernel to effectively model smooth functions. It is particularly adept at generating continuous and smooth predictions.
19. Matern 5/2 Gaussian Process Regression: This variant of Gaussian process regression employs the Matern kernel with a smoothness parameter of 5/2, providing enhanced flexibility in modeling functions that display varying levels of smoothness.
20. Exponential Gaussian Process Regression: This methodology incorporates an exponential kernel within Gaussian process regression, rendering it suitable for modeling functions that may exhibit limited smoothness but demonstrate exponential decay behavior.
21. Rational Quadratic Gaussian Process Regression: This Gaussian process utilizes the rational quadratic kernel, which integrates features from both squared exponential and linear kernels. This combination allows for a high degree of flexibility in capturing patterns with diverse smoothness characteristics.
22. Narrow NN: This neural network configuration comprises fewer neurons within each layer. Although this structure may limit its ability to capture intricate relationships, it offers advantages such as expedited training processes and a reduced likelihood of overfitting.
23. Medium NN: This architecture incorporates a moderate number of neurons, aiming to establish a balance between model complexity and the dynamics of training.
24. Wide NN: A wide neural network features a larger number of neurons in one or more layers, enabling it to identify complex patterns. However, this complexity may lead to challenges, particularly concerning overfitting.
25. Bilayered NN: This architecture consists of two primary layers, generally encompassing one hidden layer succeeded by an output layer. Its simplicity often facilitates easier interpretation.
26. Trilayered NN: This more elaborate structure includes three distinct layers: input, hidden, and output. The inclusion of an additional layer enhances the network’s capacity to learn more comprehensive representations of the input data.
27. SVM Kernel: The SVM Kernel refers to the kernel function employed in support vector machines, which facilitates the transformation of input data into a higher-dimensional space. This transformation enhances the separation of classes, thereby improving classification performance. Common types of kernels utilized in this context include linear, polynomial, and radial basis function (RBF) kernels.
28. Least Squares Regression Kernel: The least squares regression kernel is utilized in scenarios involving least squares fitting within kernelized feature spaces. This approach allows for the efficient management of regression tasks in high-dimensional environments, significantly enhancing computational efficacy and accuracy.
Source link
Beáta Novotná www.mdpi.com