Sustainability, Vol. 17, Pages 2843: Predicting Energy-Based CO2 Emissions in the United States Using Machine Learning: A Path Toward Mitigating Climate Change


Sustainability, Vol. 17, Pages 2843: Predicting Energy-Based CO2 Emissions in the United States Using Machine Learning: A Path Toward Mitigating Climate Change

Sustainability doi: 10.3390/su17072843

Authors:
Longfei Tian
Zhen Zhang
Zhiru He
Chen Yuan
Yinghui Xie
Kun Zhang
Ran Jing

Climate change is one of the most pressing global challenges that could potentially threaten ecosystems, human populations, and weather patterns over time. Impacts including rising sea levels and soil salinization are caused by climate change, primarily driven by human activities such as fossil fuel combustion for energy production. The resulting greenhouse gas (GHG) emissions, particularly carbon dioxide (CO2) emissions, amplify the greenhouse effect and accelerate global warming, underscoring the urgent need for effective mitigation strategies. This study investigates the performance and outcomes of various machine learning regression models for predicting CO2 emissions. A comprehensive overview of performance metrics, including R2, mean absolute error, mean squared error, and root-mean-squared error, and cross-validation scores for decision tree, random forest, multiple linear regression, k-nearest neighbors, gradient boosting, and support vector regression models was conducted. The biggest source of CO2 emissions was coal (46.11%), followed by natural gas (25.49%) and electricity (26.70%). Random forest and gradient boosting both performed well, but multiple linear regression had the highest prediction accuracy among machine learning models (R2 = 0.98 training, 0.99 testing). Support vector regression (SVR) and k-nearest neighbors (KNN) demonstrated lower accuracies, whereas decision tree displayed overfitting. The decision tree, random forest, multiple linear regression, and gradient boosting models were found to be extremely sensitive to coal, natural gas, and petroleum (transportation sector) based on sensitivity analysis. Random forest and gradient boosting demonstrated the most sensitivity to coal usage, whereas KNN and SVR maintained excellent R2 scores (0.94–0.98) but were less susceptible to changes in the variables. This analysis provides insights into the agreement and discrepancies between predicted and actual CO2 emissions, highlighting the models’ effectiveness and potential limitations.



Source link

Longfei Tian www.mdpi.com