Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer (HiSGT)

📂 Project Structure

|--- baselines/          # Model implementations, training, and sampling scripts
|--- data/               # Raw and processed datasets
|--- evaluation/         # Evaluation metrics
|--- output/             # Experiment outputs (logs, checkpoints, synthetic data, evaluation results, etc.)
|--- scripts/            # Scripts for executing experiments

🚀 Quick Start

1️⃣ Setup Python Environment

conda create -n hisgt python==3.13.0 -y
conda activate hisgt
pip install --upgrade pip
pip install -r requirements.txt

2️⃣ Prepare the Dataset

Follow the README files in data/mimiciii/ and data/mimiciv/ for dataset preparation. This includes downloading raw MIMIC-III v1.4 and MIMIC-IV v2.2 datasets, extracting patient sequnces, processing them for model training, and constructing additional hierarchical and semantic embeddings.

3️⃣ Train Models, Generate Synthetic Data, and Evaluate

Run the following scripts to train HiSGT and baselines. If Slurm is not available, you can adapt them into standard Bash commands.

🔹 For MIMIC-III:

sbatch scripts/mimiciii_1.4_convert_icd10/slurm_gpu_hisgt.sh   # Train HiSGT
sbatch scripts/mimiciii_1.4_convert_icd10/slurm_gpu_baselines.sh  # Train baselines

🔹 For MIMIC-IV:

sbatch scripts/mimiciv_2.2_icd9_subset_convert_icd10/slurm_gpu_hisgt.sh  # Train HiSGT
sbatch scripts/mimiciv_2.2_icd9_subset_convert_icd10/slurm_gpu_baselines.sh  # Train baselines

These scripts will handle model training, synthetic data generation, and evaluation metrics computation.

📝 Acknowledgments

We acknowledge the HALO and ETHOS, upon which some of our baseline implementations are built and the icd-9 to icd-10 mapping file is borrowed from. We also thank Joel Jacob for his contributions in reproducing the EVA and SynTEG methods.

✅ Citation

If you find our work useful in your research, please consider citing:

@INCOLLECTION{Zhou2025-mt,
  title     = "Generating Clinically Realistic {EHR} data via a Hierarchy- and
               Semantics-Guided Transformer",
  booktitle = "Frontiers in Artificial Intelligence and Applications",
  author    = "Zhou, Guanglin and Barbieri, Sebastiano",
  publisher = "IOS Press",
  series    = "Frontiers in Artificial Intelligence and Applications",
  month     =  oct,
  year      =  2025
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer (HiSGT)

📂 Project Structure

🚀 Quick Start

1️⃣ Setup Python Environment

2️⃣ Prepare the Dataset

3️⃣ Train Models, Generate Synthetic Data, and Evaluate

📝 Acknowledgments

✅ Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer (HiSGT)

📂 Project Structure

🚀 Quick Start

1️⃣ Setup Python Environment

2️⃣ Prepare the Dataset

3️⃣ Train Models, Generate Synthetic Data, and Evaluate

📝 Acknowledgments

✅ Citation