A Pixel-Based Machine Learning Atmospheric Correction for PeruSAT-1 Imagery


3.2.2. Feedforward Neural Network Regression Approach

Hyperparameter tuning using Keras Tuner identified optimal ranges for the model architecture, as summarized in Table 1. Both linear (identity) and nonlinear activation functions (Leaky ReLU, Tanh, Swish, and Softmax) were tested, with nonlinear functions generally showing superior results. The tuning process defined 2–3 hidden layers as optimal, and while networks of this depth exceed simple neural networks, they do not fully align with the criteria for “deep learning” as commonly defined [50,51,52]. To maintain clarity, this study refers to the explored architectures as Feedforward Neural Networks (FFNNs).
Regarding the selection of appropriate nonlinear activation functions, preliminary tests were conducted using 30% of the dataset to compare monotonic functions (Leaky ReLU and Tanh) with non-monotonic functions (Softmax and Swish) under a ceteris paribus condition. These tests revealed that Softmax exhibited the poorest performance, with significantly higher loss and MAE values, confirming its unsuitability for regression tasks (Figure 8). Consequently, Softmax was excluded from subsequent evaluations. The remaining functions—Swish, Leaky ReLU, and Tanh—were subsequently assessed using 50% of the dataset to identify their optimal configurations.
Figure 9 presents the top 10 configurations obtained for the three selected activation functions. Swish achieved slightly better MAE and loss values with its top configuration consisting of 64 neurons, 2 hidden layers, and batch sizes of 128 or 256. However, it exhibited significant variability in loss across configurations, with only its best setup demonstrating stability in MAE. The remaining configurations were notably less stable, which limited its overall reliability.

Leaky ReLU ranked second in MAE performance but outperformed Swish in stability. Its top three configurations delivered consistently strong and stable results for both MAE and loss, demonstrating reliability across metrics and surpassing Swish in overall robustness. Tanh ranked third, achieving competitive MAE values but showing higher loss variability compared to Leaky ReLU, which limited its applicability in scenarios requiring consistent performance.

Based on this ranking, the best configurations for each activation function (highlighted in bold and marked with asterisks in Figure 9) were selected to train the FFNN models on the complete dataset. Each configuration label includes the activation function name, along with the numeric values for initial neurons, hidden layers, and batch size (e.g., Swish-64-2-128).
Table A2 presents the validation results for each activation function, elucidating their distinct performance characteristics. Swish achieved R2 values exceeding 0.90 in the majority of samples, albeit exhibiting slightly diminished performance in the red and blue bands, particularly in sample 20200718150152_B. This suggests a heightened sensitivity to specific bands in complex scenarios. Leaky ReLU exhibited balanced and reliable performance across all bands and samples, ensuring stable predictions under diverse conditions. Tanh generally attained the highest R2 values but exhibited moderate variability in performance, notably in the blue band, which may limit its applicability in contexts demanding consistency across all bands.
Visually, as depicted in Figure 10, all models yielded satisfactory predictions, albeit exhibiting discernible variations in color representation. As elucidated in Section 3.2.1, the images underwent normalization to the 99th percentile of their respective TOA. The observable variations encompass color saturation: Tanh imparts a reddish hue, particularly evident in sample 20210617153211_B and certain sandy regions of sample 20230511154917_H. Although less pronounced, this effect is also discernible with Swish. Leaky ReLU exhibits diminished green color accuracy in the Amazon region (sample 20200718150152_B), likely attributable to the shadow-induced roughness in this sample, as previously discussed. Notably, it circumvents the saturation issues observed with the other activation functions, providing more consistent color representation despite its limitations in specific scenarios.

3.2.3. Global Evaluation of Machine Learning Models Performance

The models’ performance was assessed using four key criteria: precision, accuracy, generalization capacity, and domain-specific knowledge. To visualize these criteria, Figure 11 (scatter plots) and Figure 12 (density plots) illustrate the global performance evaluations, grouped by model type (rows) and spectral band (columns). To reduce bias, the sample 20200718150152_B was excluded, and benchmark pixels classified as water, clouds, and cloud shadows were removed from the test dataset.

The regression lines between the predicted and benchmark values, represented by the equation Y = b 1 X + b 0 (where b 1 is the slope and b 0 is the intercept), are shown as orange lines, with a dotted line indicating perfect predictions. Each subplot includes the Coefficient of Determination (R2) and Root Mean Squared Error (RMSE), which help to assess the model’s performance. The scatter distributions, regression fits, and density patterns reveal how each model responds at different reflectance values.

Precision is defined by how tightly the predictions cluster around the trend line. In this study, R 2 was used to assess the global dispersion of predictions. Higher R 2 values signify greater precision and tighter clustering of predictions around a linear trend, indicating more reliable model behavior.

Accuracy, defined as the numerical proximity between predicted and actual values, is assessed in two ways: Firstly, a b 1 value closer to 1 indicates higher accuracy. Secondly, prediction error is evaluated using both global and local approaches. Globally, RMSE measures overall model error, with lower RMSE values indicating higher global accuracy. Locally, the relative error (RE) in percentage terms (Equation (8)) quantifies the proportional error at the pixel level, offering a more refined measure of model performance:

Relative   Error = | y i y ^ i | y i × 100 %

where y i is the actual values, and y ^ i is the predicted values.

Numerical metrics alone are inadequate for a complete evaluation, as strong metrics may not always guarantee the overall model performance across diverse scenarios. Therefore, additional criteria are employed. Generalization capacity refers to the model’s ability to predict across the full range of benchmark values, whereas domain-specific knowledge ensures that predictions are realistic and physically plausible, while assessing the model’s adaptability to the specific problem at hand.

Density plots provide additional insights into the distribution of predicted versus actual values. These plots are particularly useful for identifying areas of high concentration that might be obscured by the point cloud dispersion in scatter plots.

Swish exhibited strong numerical performance in terms of precision and accuracy, with coefficients b 1 = 0.959 ± 0.026 , b 0 = 0.0085 ± 0.009 , R 2 = 0.919 ± 0.022 , and RMSE ranging from 0.009 to 0.025 . However, scatter and density plots revealed irregularities in the red band, including non-uniform distributions at mid-range predictions and deviations at high reflectance. Meanwhile, the green and blue bands performed well, with R 2 values of 0.938 and 0.924 . Swish’s reliability was reduced due to inconsistent generalization in the red band. Despite this, it remains a promising model with potential for further optimization.

The Tanh function’s generalization capacity was notably limited, particularly at high reflectance levels. Despite respectable coefficients b 1 = 0.945 ± 0.023 , b 0 = 0.0084 ± 0.0102 , R 2 = 0.910 ± 0.016 , and RMSE ranging from 0.011 to 0.025 , its predictions were compressed to ranges near half of the maximum expected values as indicated by scatter and density plots. Consequently, under the proposed approach, Tanh is deemed unsuitable.

Leaky ReLU demonstrated balanced performance across all spectral bands, with coefficients b 1 = 0.970 ± 0.023 , b 0 = 0.0055 ± 0.0063 , R 2 = 0.916 ± 0.021 , and RMSE values ranging from 0.010 to 0.024 . Scatter and density plots revealed consistent clustering around the regression line, ensuring reliability across both low and high reflectance. Except for the NIR band, Leaky ReLU outperformed other models in terms of global accuracy, with minimal RMSE and b 1 values closer to 1 for visible bands. In terms of generalization, this model demonstrated robust predictive capability across the entire benchmark range. Focused on the problem at hand, the visible bands (red, green and blue) are relatively more important than NIR for most users of the PER1 system.

MLR demonstrated stable performance with coefficients b 1 = 1.032 ± 0.036 , b 0 = 0.0014 ± 0.0035 , R 2 = 0.921 ± 0.007 , and RMSE ranging from 0.011 to 0.024 . However, except for the NIR band, its b 1 coefficients were the furthest from 1 when compared to other models, resulting in the overestimation of reflectance, particularly in the blue band, where high scattering causes nonlinearities that MLR could not fully capture, reducing its accuracy in certain spectral bands. The model’s intercepts ( b 0 ) were negative along the visible spectrum, which could lead to physically implausible reflectance predictions. Although the MLR predictions got better precision by clustering along its regression line, being the most consistent in scatter and density plots, its inability to handle nonlinearities prevented it from aligning more closely with the identity line compared to Leaky ReLU. Despite its limitations in the visible spectrum, the MLR model for the NIR band is effective and can be used in conjunction with Leaky ReLU for the atmospheric correction pipeline.

The previous analysis assessed the model performance results in the frame of the four key criteria at the global level. However, RMSE is insufficient due to its minimal differences across bands, with an average of 0.016 ± 0.006 . Additionally, RMSE may be influenced by large data values, which can obscure differences in local performance. Therefore, Figure 13 provides a comparative distribution of percentage relative error across methods and spectral bands. This last metric quantifies the local prediction error without being affected by pixel magnitude. Models with narrower ranges and medians closer to zero indicate higher reliability; conversely, a wider range indicates higher probability of obtaining bigger errors at the pixel-level predictions. In this figure, Swish shows good performance, but it was discarded due to its weak generalization capability. Leaky ReLU is the next activation function with the narrowest error ranges across bands, supporting its RMSE results, while MLR is the model with the poorest performance results for the visible spectrum.

At the band level, the NIR band showed the smallest error range and least variability across methods, attributed to the reduced atmospheric scattering at longer wavelengths. In contrast, the blue band exhibited the largest relative error range, particularly for MLR and Tanh, with relative errors reaching up to 30%, highlighting the challenges of correcting shorter wavelengths prone to atmospheric interference. The red and green bands showed similar error distributions for all models, though MLR had slightly higher variability.

Overall, Leaky ReLU was identified as the most reliable method, outperforming others in both accuracy and reliability. Figure 11 and Figure 12 confirm its strong generalization across a wide reflectance range, while Figure 13 highlights its lower local error variability compared to MLR. Swish showed good performance but struggled in mid- to high-reflectance scenarios. Tanh suffered from compressed predictions and larger errors, compromising both precision and accuracy. MLR struggled with nonlinear atmospheric effects at shorter wavelengths but is the best option for the NIR band. Additionally, MLR is the only model with negative intercepts in the visible bands, leading to physically implausible predictions for low-reflectance scenarios. Consequently, Leaky ReLU is the most suitable model for the visible spectrum, while MLR performs better for the NIR band.



Source link

Luis Saldarriaga www.mdpi.com