X-Ray Synthetic Data Study: Diffusion Transformers

📋 Overview

xray-synthetic-data-study is a research initiative evaluating the efficacy of Diffusion Transformers (DiT) for generating high-fidelity synthetic chest X-rays.

While traditional GANs often suffer from mode collapse and training instability, Diffusion Models—specifically those leveraging Transformer backbones—offer superior scalability and sample quality. This repository explores whether synthetic data generated via DiT can effectively augment real-world medical datasets to address scarcity and class imbalance in disease classification tasks.

🎯 Objectives

Architecture Implementation: Train a Diffusion Transformer (DiT) (replacing the standard U-Net backbone) to model the complex distribution of chest X-ray data.
Latent Space Modeling: Utilize a VAE (Variational Autoencoder) to compress X-rays into latent patches for efficient training.
Downstream Evaluation: Train classifiers (e.g., ResNet/ViT) on synthetic-augmented datasets to benchmark performance gains against real-only baselines.
Privacy Preservation: Assess the potential of DiT-generated data to serve as a privacy-compliant proxy for sensitive patient records.

🛠️ Methodology

1. Latent Compression

Instead of operating in pixel space, we first train (or fine-tune) a VAE to compress 256x256 X-rays into lower-dimensional latent representations.

This allows the diffusion model to focus on semantic structure rather than high-frequency noise.

2. Diffusion Transformer (DiT)

We employ a transformer-based backbone for the denoising process:

Patchify: Latents are tokenized into sequences of patches.
Transformer Blocks: Standard multi-head self-attention mechanisms process the noisy patches.
Conditioning: Class labels (e.g., Pneumonia, Normal) are injected via adaptive layer normalization (adaLN).

3. Training & Sampling

Forward Process: Gradually add Gaussian noise to latent patches.
Reverse Process: The DiT predicts the noise to reconstruct the clean latent, which is then decoded back to pixel space.

Project Roadmap

Phase 1: Classification (Complete)

Train EVA-02 on Chest X-Rays
Generate Confusion Matrix & ROC Curves

Phase 2: Generative Augmentation (In Progress)

Set up DiT-Base architecture
Train DiT on minority class (Cardiomegaly) to 50k steps
Generate 500 synthetic images
Filter synthetic images for quality

Phase 3: Re-Training

Mix synthetic data with original train set
Retrain EVA-02 and compare AUC scores

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data/metadata		data/metadata
notebook		notebook
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X-Ray Synthetic Data Study: Diffusion Transformers

📋 Overview

🎯 Objectives

🛠️ Methodology

1. Latent Compression

2. Diffusion Transformer (DiT)

3. Training & Sampling

Project Roadmap

Phase 1: Classification (Complete)

Phase 2: Generative Augmentation (In Progress)

Phase 3: Re-Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

X-Ray Synthetic Data Study: Diffusion Transformers

📋 Overview

🎯 Objectives

🛠️ Methodology

1. Latent Compression

2. Diffusion Transformer (DiT)

3. Training & Sampling

Project Roadmap

Phase 1: Classification (Complete)

Phase 2: Generative Augmentation (In Progress)

Phase 3: Re-Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages