Multi-modal fusion methods aim to integrate information from diverse data types. While modalities in domains such as audio-visual processing are often naturally paired and captured simultaneously, healthcare data is inherently asynchronous and heterogeneous. Clinical data—such as electronic health records (EHR), chest X-ray (CXR) images, radiology reports (RR), and discharge notes (DN)—is collected at different times and under different conditions, making full-modality availability unrealistic for clinical modeling.
To address these challenges, we propose MedPatch, a multi-stage fusion network designed to operate effectively under uni-modal and multi-modal conditions. MedPatch introduces a token-level confidence mechanism that partitions representations into high- and low-confidence groups, guiding fusion through a missingness-aware module. This leads to improved robustness and performance for clinical prediction tasks such as in-hospital mortality and phenotype classification, even when some data modalities are missing.
We build on the MIMIC-IV and MIMIC-CXR datasets for our experiments. MedPatch is trained in two stages: (1) uni-modal pretraining of modality-specific encoders, and (2) confidence-guided multi-modal fusion. MedPatch processes input from EHR time-series, CXR images, and clinical notes (RR and DN), dynamically adapting to the available data.
(a) Summary of dataset splits. (b) Unimodal pretraining pipeline for each modality. (c) Overview of the joint module used in our architecture. (d) Overview of the MedPatch architecture highlighting all the components including the missingness module, the joint module and its two predictions, and the final prediction returned by the late module.
git clone https://github.com/your-org/MedPatch.git
cd MedPatch
conda env create -f environment.yml
conda activate medpatch
We use the following datasets for our experiments:
- MIMIC-IV EHR
- MIMIC-CXR
- MIMIC-IV Notes — for Radiology Reports (RR) and Discharge Notes (DN)
Please follow the README in mimic4extract/
for extracting and preparing the time-series EHR data.
Before running scripts, set the ehr_data_dir
, cxr_data_dir
, and notes_data_dir
in your config files.
To resize chest X-ray images:
python resize.py
To ensure consistent splits between CXR and EHR datasets:
python create_split.py
Refer to arguments.py
for full configuration options.
We follow a three-stage training process.
# Train Each unimodal encoder for mortality
sh ./scripts/mortality/unimodal/EHR.sh
sh ./scripts/mortality/unimodal/CXR.sh
sh ./scripts/mortality/unimodal/RR.sh
# Train LSTM model on EHR data for phenotype task
sh ./scripts/phenotyping/unimodal/EHR.sh
sh ./scripts/phenotyping/unimodal/CXR.sh
sh ./scripts/phenotyping/unimodal/RR.sh
sh ./scripts/phenotyping/unimodal/DN.sh
Update `load_ehr`, `load_cxr`, `load_rr`, `load_dn` with the best checkpoints from Uni-modal Encoder Pretraining.
# in-hospital mortality Confidence Training - available for other modalities as well
sh ./scripts/mortality/Confidence/Confidence-EHR.sh
# phenotyping Confidence Training - available for other modalities as well
sh ./scripts/phenotyping/Confidence/Confidence-EHR.sh
#Optional Post-Hoc Calibration
# Also available for other modalities
sh ./scripts/mortality/Calibrate/Calibrate-EHR.sh
sh ./scripts/phenotyping/Calibrate/Calibrate-EHR.sh
Update `load_ehr`, `load_cxr`, `load_rr`, `load_dn` with the best checkpoints from Confidence Pretraining.
# MedPatch for in-hospital mortality
sh ./scripts/mortality/MedPatch/Confidence-Patching.sh
# MedPatch for phenotype classification
sh ./scripts/phenotyping/MedPatch/Confidence-Patching.sh
Training scripts for all baseline models are provided under scripts/
. Learning rates were selected from 10 random samples in the range [0.00001, 0.001].
Update load_state
with the trained model path. Change all other necessary parameters as desired.
# Evaluate MedPatch for in-hospital mortality
sh ./scripts/mortality/evaluate.sh
# Evaluate MedPatch for phenotype classification
sh ./scripts/phenotyping/evaluate.sh
If you use this code or data for your research, please consider citing:
@misc{your2025medpatch,
author = {Baraa Al Jorf, Farah E.Shamout},
title = {MedPatch: Confidence-Guided Multi-Stage Fusion for Multimodal Clinical Data},
year = {2025},
url = {TBD},
note = {Accepted at MLHC 2025}
}