Skip to content

jameszhou-gl/HiSGT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

📂 Project Structure

|--- baselines/          # Model implementations, training, and sampling scripts
|--- data/               # Raw and processed datasets
|--- evaluation/         # Evaluation metrics
|--- output/             # Experiment outputs (logs, checkpoints, synthetic data, evaluation results, etc.)
|--- scripts/            # Scripts for executing experiments

🚀 Quick Start

1️⃣ Setup Python Environment

conda create -n hisgt python==3.13.0 -y
conda activate hisgt
pip install --upgrade pip
pip install -r requirements.txt

2️⃣ Prepare the Dataset

Follow the README files in data/mimiciii/ and data/mimiciv/ for dataset preparation. This includes downloading raw MIMIC-III v1.4 and MIMIC-IV v2.2 datasets, extracting patient sequnces, processing them for model training, and constructing additional hierarchical and semantic embeddings.

3️⃣ Train Models, Generate Synthetic Data, and Evaluate

Run the following scripts to train HiSGT and baselines. If Slurm is not available, you can adapt them into standard Bash commands.

🔹 For MIMIC-III:

sbatch scripts/mimiciii_1.4_convert_icd10/slurm_gpu_hisgt.sh   # Train HiSGT
sbatch scripts/mimiciii_1.4_convert_icd10/slurm_gpu_baselines.sh  # Train baselines

🔹 For MIMIC-IV:

sbatch scripts/mimiciv_2.2_icd9_subset_convert_icd10/slurm_gpu_hisgt.sh  # Train HiSGT
sbatch scripts/mimiciv_2.2_icd9_subset_convert_icd10/slurm_gpu_baselines.sh  # Train baselines

These scripts will handle model training, synthetic data generation, and evaluation metrics computation.

📝 Acknowledgments

We acknowledge the HALO and ETHOS, upon which some of our baseline implementations are built and the icd-9 to icd-10 mapping file is borrowed from. We also thank Joel Jacob for his contributions in reproducing the EVA and SynTEG methods.

✅ Citation

If you find our work useful in your research, please consider citing:

@INCOLLECTION{Zhou2025-mt,
  title     = "Generating Clinically Realistic {EHR} data via a Hierarchy- and
               Semantics-Guided Transformer",
  booktitle = "Frontiers in Artificial Intelligence and Applications",
  author    = "Zhou, Guanglin and Barbieri, Sebastiano",
  publisher = "IOS Press",
  series    = "Frontiers in Artificial Intelligence and Applications",
  month     =  oct,
  year      =  2025
}

About

Code for ECAI'25-Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors