RESEARCH ARTICLE

Early Detection of Neonatal Infection in NICU Using Machine Learning Models: A Retrospective Observational Cohort Study

The Open Bioinformatics Journal 03 Mar 2026 RESEARCH ARTICLE DOI: 10.2174/0118750362416494251124105928

Abstract

Introduction

Neonatal infections remain a major threat in intensive care units (ICUs), often progressing rapidly and asymptomatically within the first hours of admission. Early detection is critical to improve outcomes, yet timely and reliable risk prediction remains a challenge.

Method

We developed an explainable machine learning retrospective observational cohort study for early prediction of neonatal infection using high-resolution data from the MIMIC-III database. Two-time windows, 30 and 120 minutes post-ICU admission, were analyzed. Physiological and hematological variables were aggregated, and missing data were imputed using Iterative Imputation. We used stratified five-fold cross-validation to test several classification models and feature importance and SHAP analysis to examine model interpretability.

Result

CatBoost demonstrated the best performance in the 30-minute window (F1-score = 0.76), and Gradient Boosting had the best performance in the 120-minute window (F1-score ≈ 0.80). Heart rate, white blood cell count, and temperature were important predictors because they showed both physiological stability and immune response. The 120-minute window made the model work better, indicating the importance of data availability for making accurate predictions.

Discussion

CatBoost at the 30-minute window and Gradient Boosting at the 120-minute window could find a balance between speed and accuracy. Model-based imputation could handle high missing-value rates, but external validation is needed because the data came from only one center and the devices were different.

Conclusion

This study proposes a two-stage decision-support system that adapts to data collected during early and later ICU admission periods. By combining accurate prediction with model interpretability, the framework may enable timely diagnosis and targeted interventions, ultimately reducing neonatal morbidity and mortality.

Keywords: Neonatal infection, Machine learning, Early prediction, MIMIC-III, Catboost, Gradient boosting, NICU.
Fulltext HTML PDF ePub
1800
1801
1802
1803
1804