Diagnostics, Vol. 16, Pages 583: Integrating Host Genetics and Clinical Setting in Machine Learning Models: Predicting COVID-19 Prognosis for Healthcare Decision-Making (The FeMiNa Study)
Diagnostics doi: 10.3390/diagnostics16040583
Authors:
Elisabetta D’Aversa
Bianca Antonica
Miriana Grisafi
Rosanna Asselta
Elvezia Maria Paraboschi
Angelina Passaro
Stefano Volpato
Francesca Remelli
Massimiliano Castellazzi
Alberto Maria Marra
Antonio Cittadini
Roberta D’Assante
Francesca Salvatori
Ajay Vikram Singh
Salvatore Pernagallo
Veronica Tisato
Donato Gemmati
Background/Objectives: COVID-19 has made a tremendous impact, causing a massive number of deaths worldwide. The inadequacy of health facilities resulted in shortage of resources and exhaustion of frontline workers who had to manage in a short time many patients with no tools to prioritize those at high risk. This study intended to disclose the architecture of such complex disease and enhance the management of hospitalized patients, preventing severe outcomes. Methods: We performed a retrospective multicenter study aimed at refining the best predictive model for COVID-19 mortality, integrating 19 genetic and 13 clinical features. We trained three machine learning (ML) models (GBM, XGB and RF) on a dataset of 532 COVID-19 hospitalized Italian patients, among the 605 recruited during the first wave of the pandemic, when vaccines were not available. Results: All the models achieved great values for accuracy, AUROC, f1, f2 and PR-AUC metrics. XGB’s f1 optimization resulted in better performance providing fewer false positives (Nf1 = 26 versus Nf2 = 27, NPR-AUC = 29), and mostly false negatives (Nf1 = 63 versus Nf2 = 69, NPR-AUC = 69), being the main goal to answer. We next delved into the feature importance to understand which features contribute to the model’s decision: age was the main driver of mortality prediction, followed by ventilation. The remainder was equally distributed between genetic (HLA-DRA rs3135363, PPARGC1A rs192678, CRP rs2808635, ABO rs657152) and other clinical features, demonstrating that genetic data did not confound, but rather implemented, the power of the model. Conclusions: Our results suggest that integrating genetic and clinical data into ML models is crucial for identifying high-risk cases within the vast disease heterogeneity, enabling the P4-medicine approach to improve patient outcomes and support the healthcare system.
Source link
Elisabetta D’Aversa www.mdpi.com
