PRISM: High-Resolution and Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion
This is the official repository of the PRISM (submitted to Medical Imaging with Deep Learning (MIDL 2025)).
PRISM - Precise counterfactual Image generation using language guided Stable diffusion Model
This repository is organized into two branches:
main: Contains all source code and implementation fileswebsite: Houses the project website and documentation assets
You are currently on the main branch of this repository. Visit the website branch to access the website files and source code.
Create a virtual Environment and install the necessary packages from the requirements.txt file as shown:
pip install -r requirements.txt --no-cacheNote: The
transformersanddiffuserslibraries version must match as specified in therequirements.txt. In case of error due to library mismatch,huggingface_hub==0.25.2can also be installed.
Data Preparation: Signup to access the CheXpert dataset from here. Split the dataset into 70-15-15 for the train-validation-test split. This split will remain the same for all the experiments.
PRISM utilises the backbone of Stable Diffusion(SD) v1.5.
torchrun --nproc_per_node=4 finetune_chexpert.pyNote: the command to finetune is
torchrunand notpython
The finetune_chexpert.py script enables distributed training to fine-tune Stable Diffusion on chest X-ray images with associated pathology labels. The script:
- Creates automatic captions based on pathology findings
- Trains only the UNet component while freezing VAE and text encoder
- Supports distributed training with mixed precision
- Includes checkpoint saving and logging
Below are the important parameters that sets the paths:
| Parameter | Default | Description |
|---|---|---|
--model_name_or_path |
runwayml/stable-diffusion-v1-5 |
Base pretrained model to fine-tune |
--train_data_path |
/usr/local/.../finetune.csv |
Path to CheXpert CSV file with pathology labels |
--image_root_path |
/usr/local/datasets/ |
Root directory containing the chest X-ray images |
--output_dir |
/usr/local/.../finetuned |
Directory to save the fine-tuned model and checkpoints |
For fine-tuning, we use 4 A100 GPUs with 40GB each. The wall clock time to fine-tune SDv1.5 was 6 hours.
python generate_cf_images.pyThe generate_cf_images.py script uses a technique to generate counterfactual versions of chest X-ray images.
| Parameter | Description |
|---|---|
ldm_type |
Type of diffusion model to use. Options: stable_diffusion_v1_4, stable_diffusion_v1_5, stable_diffusion_mimic_cxr_v0.1, finetuned_chexpert |
self_replace_steps_range |
Controls the strength of self-attention replacement during editing. Higher values result in stronger edits but less preservation of original structure |
edit_word_weight |
Emphasis placed on the edit word in the prompt. Higher values lead to stronger edits |
clip_img_thresh |
Threshold for image-image similarity (higher = more similar to original) |
clip_thresh |
Threshold for image-text similarity |
clip_dir_thresh |
Threshold for directional similarity (measures if edit is in the right direction) |
text_similarity_threshold |
Controls filtering of edits based on text similarity to ground truth |
![]() |
![]() |
| Editing Medical Devices using PRISM | XAI using PRISM |
@misc{kumar2025prism,
title={PRISM: High-Resolution \& Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion},
author={Kumar, Amar and Kriz, Anita and Havaei, Mohammad and Arbel, Tal},
eprint={2503.00196},
url={https://arxiv.org/abs/2503.00196},
year={2025}
}PRISM is built on top of several excellent repositories - LANCE, Prompt-to-prompt. For comparisons, we also use codes from the repositories - RadEdit, Imagic, Null-Text Inversion. Additionally, we leverage and borrow a few techniques from Instruct-Pix2Pix, huggingface-transformers.

