PRISM: High-Resolution and Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

This is the official repository of the PRISM (submitted to Medical Imaging with Deep Learning (MIDL 2025)).

PRISM - Precise counterfactual Image generation using language guided Stable diffusion Model

Repository Structure

This repository is organized into two branches:

main: Contains all source code and implementation files
website: Houses the project website and documentation assets

You are currently on the main branch of this repository. Visit the website branch to access the website files and source code.

Getting Started

Virtual Environment Setup

Create a virtual Environment and install the necessary packages from the requirements.txt file as shown:

pip install -r requirements.txt --no-cache

Note: The transformers and diffusers libraries version must match as specified in the requirements.txt. In case of error due to library mismatch, huggingface_hub==0.25.2 can also be installed.

Create Dataset

Data Preparation: Signup to access the CheXpert dataset from here. Split the dataset into 70-15-15 for the train-validation-test split. This split will remain the same for all the experiments.

Core Functionalities

Finetune Stable Diffusion

PRISM utilises the backbone of Stable Diffusion(SD) v1.5.

torchrun --nproc_per_node=4 finetune_chexpert.py

Note: the command to finetune is torchrun and not python

The finetune_chexpert.py script enables distributed training to fine-tune Stable Diffusion on chest X-ray images with associated pathology labels. The script:

Creates automatic captions based on pathology findings
Trains only the UNet component while freezing VAE and text encoder
Supports distributed training with mixed precision
Includes checkpoint saving and logging

Below are the important parameters that sets the paths:

Parameter	Default	Description
`--model_name_or_path`	`runwayml/stable-diffusion-v1-5`	Base pretrained model to fine-tune
`--train_data_path`	`/usr/local/.../finetune.csv`	Path to CheXpert CSV file with pathology labels
`--image_root_path`	`/usr/local/datasets/`	Root directory containing the chest X-ray images
`--output_dir`	`/usr/local/.../finetuned`	Directory to save the fine-tuned model and checkpoints

For fine-tuning, we use 4 A100 GPUs with 40GB each. The wall clock time to fine-tune SDv1.5 was 6 hours.

Counterfactual Image Generation

python generate_cf_images.py

The generate_cf_images.py script uses a technique to generate counterfactual versions of chest X-ray images.

Key Parameters

Parameter	Description
`ldm_type`	Type of diffusion model to use. Options: `stable_diffusion_v1_4`, `stable_diffusion_v1_5`, `stable_diffusion_mimic_cxr_v0.1`, `finetuned_chexpert`
`self_replace_steps_range`	Controls the strength of self-attention replacement during editing. Higher values result in stronger edits but less preservation of original structure
`edit_word_weight`	Emphasis placed on the edit word in the prompt. Higher values lead to stronger edits
`clip_img_thresh`	Threshold for image-image similarity (higher = more similar to original)
`clip_thresh`	Threshold for image-text similarity
`clip_dir_thresh`	Threshold for directional similarity (measures if edit is in the right direction)
`text_similarity_threshold`	Controls filtering of edits based on text similarity to ground truth

Classifiers

Baselines

Cyle-GAN

Other

Examples


Editing Medical Devices using PRISM	XAI using PRISM

Citation

@misc{kumar2025prism,
title={PRISM: High-Resolution \& Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion},
author={Kumar, Amar and Kriz, Anita and Havaei, Mohammad and Arbel, Tal},
eprint={2503.00196},
url={https://arxiv.org/abs/2503.00196},
year={2025}
}

Acknowledgements

PRISM is built on top of several excellent repositories - LANCE, Prompt-to-prompt. For comparisons, we also use codes from the repositories - RadEdit, Imagic, Null-Text Inversion. Additionally, we leverage and borrow a few techniques from Instruct-Pix2Pix, huggingface-transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
notebooks		notebooks
utils		utils
.gitignore		.gitignore
README.md		README.md
finetune_chexpert.py		finetune_chexpert.py
generate_cf_images.py		generate_cf_images.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRISM: High-Resolution and Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Table of Contents

Repository Structure

You are currently on the main branch of this repository. Visit the website branch to access the website files and source code.

Getting Started

Virtual Environment Setup

Create Dataset

Core Functionalities

Finetune Stable Diffusion

Counterfactual Image Generation

Key Parameters

Classifiers

Baselines

Cyle-GAN

Other

Examples

Citation

Acknowledgements

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PRISM: High-Resolution and Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Table of Contents

Repository Structure

You are currently on the main branch of this repository. Visit the website branch to access the website files and source code.

Getting Started

Virtual Environment Setup

Create Dataset

Core Functionalities

Finetune Stable Diffusion

Counterfactual Image Generation

Key Parameters

Classifiers

Baselines

Cyle-GAN

Other

Examples

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages