|--- baselines/ # Model implementations, training, and sampling scripts
|--- data/ # Raw and processed datasets
|--- evaluation/ # Evaluation metrics
|--- output/ # Experiment outputs (logs, checkpoints, synthetic data, evaluation results, etc.)
|--- scripts/ # Scripts for executing experiments
conda create -n hisgt python==3.13.0 -y
conda activate hisgt
pip install --upgrade pip
pip install -r requirements.txtFollow the README files in data/mimiciii/ and data/mimiciv/ for dataset preparation. This includes downloading raw MIMIC-III v1.4 and MIMIC-IV v2.2 datasets, extracting patient sequnces, processing them for model training, and constructing additional hierarchical and semantic embeddings.
Run the following scripts to train HiSGT and baselines. If Slurm is not available, you can adapt them into standard Bash commands.
🔹 For MIMIC-III:
sbatch scripts/mimiciii_1.4_convert_icd10/slurm_gpu_hisgt.sh # Train HiSGT
sbatch scripts/mimiciii_1.4_convert_icd10/slurm_gpu_baselines.sh # Train baselines
🔹 For MIMIC-IV:
sbatch scripts/mimiciv_2.2_icd9_subset_convert_icd10/slurm_gpu_hisgt.sh # Train HiSGT
sbatch scripts/mimiciv_2.2_icd9_subset_convert_icd10/slurm_gpu_baselines.sh # Train baselinesThese scripts will handle model training, synthetic data generation, and evaluation metrics computation.
We acknowledge the HALO and ETHOS, upon which some of our baseline implementations are built and the icd-9 to icd-10 mapping file is borrowed from. We also thank Joel Jacob for his contributions in reproducing the EVA and SynTEG methods.
If you find our work useful in your research, please consider citing:
@INCOLLECTION{Zhou2025-mt,
title = "Generating Clinically Realistic {EHR} data via a Hierarchy- and
Semantics-Guided Transformer",
booktitle = "Frontiers in Artificial Intelligence and Applications",
author = "Zhou, Guanglin and Barbieri, Sebastiano",
publisher = "IOS Press",
series = "Frontiers in Artificial Intelligence and Applications",
month = oct,
year = 2025
}