Enhancing cause of death prediction: development and validation of machine learning models using multimodal data across multiple health-care sites

Mohammed Al-Garadi; Rishi J Desai; Kerry Ngan; Michele LeNoue-Newton; Ruth M Reeves; Daniel Park; Jose J Hernández-Muñoz; Shirley V Wang; Judith C Maro; Candace C Fuller; Joshua Lin Kueiyu; Aida Kuzucan; Kevin Coughlin; Haritha Pillai; Melissa McPheeters; Jill Whitaker; Jessica A Buckner; Michael F McLemore; Dax M Westerman; Michael E Matheny

Enhancing cause of death prediction

development and validation of machine learning models using multimodal data across multiple health-care sites

Al-Garadi, M., Desai, R. J., Ngan, K., LeNoue-Newton, M., Reeves, R. M., Park, D., Hernández-Muñoz, J. J., Wang, S. V., Maro, J. C., Fuller, C. C., Kueiyu, J. L., Kuzucan, A., Coughlin, K., Pillai, H., McPheeters, M., Whitaker, J., Buckner, J. A., McLemore, M. F., Westerman, D. M., & Matheny, M. E. (2026). Enhancing cause of death prediction: development and validation of machine learning models using multimodal data across multiple health-care sites. JAMIA Open, 9(1), ooaf175. Article ooaf175. https://doi.org/10.1093/jamiaopen/ooaf175

Copy citation

Abstract

OBJECTIVES: To develop and validate machine learning (ML) models that predict probable cause of death (CoD) using structured electronic health record (EHR) data, unstructured clinical notes, and publicly available sources.

MATERIALS AND METHODS: This multi-institutional retrospective study was conducted across Vanderbilt University Medical Center (VUMC) and Massachusetts General Brigham (MGB), including deceased patients with encounters between October 1, 2015, and January 1, 2021, and confirmed death records. The cohort included 13 708 patients from VUMC and 34 839 from MGB.The primary outcome was underlying CoD categorized into the top 15 National Center for Health Statistics rankable causes, with others grouped as "Other." Performance was assessed using weighted area under the receiver operating characteristic curve (AUC) and F-measure.

RESULTS: The XGBoost model using structured EHR data alone achieved weighted AUCs of 0.86 (95% CI, 0.84-0.88) at VUMC and 0.80 (95% CI, 0.79-0.80) at MGB. Adding unstructured notes improved performance, with weighted AUCs of 0.90 (95% CI, 0.88-0.93) at VUMC and 0.92 (95% CI, 0.91-0.92) at MGB. Adding publicly available data did not further improve performance. Cross-institutional validation revealed significant performance degradation.

DISCUSSION: Models integrating structured and unstructured EHR data show strong within-institution performance but limited generalizability across healthcare systems, highlighting challenges related to institutional data heterogeneity.

CONCLUSIONS: Machine learning models combining structured and unstructured EHR data accurately predict CoD within institutions but perform poorly across sites. Health-care institutions may benefit from adopting robust processes for locally tailored models, and future research should focus on enhancing model generalizability while addressing unique institutional data environments.

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Meet the Experts

Navigate to Melissa McPheeters

Melissa McPheeters

Recent Publications

Article

Characteristics and effectiveness of patient navigation programs on colorectal cancer screening and follow-up colonoscopy uptake: A systematic review

June 2026

Article

Hypocretin receptor 1 blockade early in abstinence prevents incubation of cocaine seeking and normalizes dopamine transmission

May 2026

Article

Kinetic biomarkers of cumulative loading and daily step count as predictors of functional recovery following primary unilateral total hip arthroplasty

May 2026

Article

Long-term efficacy of dupilumab versus tezepelumab in asthma: a matching-adjusted indirect comparison

May 2026

Article

The impact of stigma on US health care provider perceptions, treatment, and care of people who may be exposed to or living with HIV

May 2026

Article

Enhancing patient education in multiple myeloma: The intersection of cognitive load and socio-emotional adaptation theory

May 2026

Article

Empowering patients with spinal cord injury: A guide to online resources for nurses

May 2026

Article

Accelerated weaning of opioids to reduce pharmacologic exposure for neonatal opioid withdrawal syndrome: A randomized clinical trial

May 2026

View All Publications