Information, Vol. 16, Pages 888: Prediction of Postoperative ICU Requirements: Closing the Translational Gap with a Real-World Clinical Benchmark for Artificial Intelligence Approaches


Information, Vol. 16, Pages 888: Prediction of Postoperative ICU Requirements: Closing the Translational Gap with a Real-World Clinical Benchmark for Artificial Intelligence Approaches

Information doi: 10.3390/info16100888

Authors:
Alexander Althammer
Felix Berger
Oliver Spring
Philipp Simon
Felix Girrbach
Maximilian Dieing
Jens O. Brunner
Sergey Shmygalev
Christina C. Bartenschlager
Axel R. Heller

Background: Accurate prediction of postoperative care requirements is critical for patient safety and resource allocation. Although numerous approaches involving artificial intelligence (AI) and machine learning (ML) have been proposed to support such predictions, their implementation in practice has so far been insufficiently successful. One reason for this is that the performance of the algorithms is difficult to assess in practical use, as the accuracy of clinical decisions has not yet been systematically quantified. As a result, models are often assessed purely from a technical perspective, neglecting the socio-technical context. Methods: We conducted a retrospective, single-center observational study at the University Hospital Augsburg, including 35,488 elective surgical cases documented between August 2023 and January 2025. For each case, preoperative care-level predictions by surgical and anesthesiology teams were compared with the actual postoperative care provided. Predictive performance was evaluated using accuracy and sensitivity. Since this is a highly imbalanced dataset, in addition to sensitivity and specificity, the balanced accuracy and the Fβ-score were also calculated. The results were contrasted with published Machine-Learning (ML)-based approaches. Results: Overall prediction accuracy was high (surgery: 91.2%; anesthesiology: 87.1%). However, sensitivity for identifying patients requiring postoperative intensive care was markedly lower than reported for ML models in the literature, with the largest discrepancies observed in patients ultimately admitted to the ICU (surgery: 38.05%; anesthesiology: 56.84%; ML: 70%). Nevertheless, clinical judgment demonstrated a superior F1-score, indicating a more balanced performance between sensitivity and precision (surgery: 0.527; anesthesiology: 0.551; ML: 0.28). Conclusions: This study provides the first real-world benchmark of clinical expertise in postoperative care prediction and shows a way in which modern ML approaches must be evaluated in a specific sociotechnical context. By quantifying the predictive performance of surgeons and anesthesiologists, it enables an evaluation of existing ML approaches. Thus the strength of our work is the provision of a real-world benchmark against which all ML methods for preoperative prediction of ICU demand can be systematically evaluated. This enables, for the first time, a comparison of different approaches on a common, practice-oriented basis and thus significantly facilitates translation into clinical practice, thereby closing the translational gap. Furthermore it offers a data-driven framework to support the integration of ML into preoperative decision-making.



Source link

Alexander Althammer www.mdpi.com