Skip to content
/ PRISM Public

Official implemention of the paper High-Resolution and Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Notifications You must be signed in to change notification settings

Amarkr1/PRISM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PRISM: High-Resolution and Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

This is the official repository of the PRISM (submitted to Medical Imaging with Deep Learning (MIDL 2025)).

PRISM - Precise counterfactual Image generation using language guided Stable diffusion Model

arXiv Website OpenReview Hugging Face License

Table of Contents

Repository Structure

This repository is organized into two branches:

  • main: Contains all source code and implementation files
  • website: Houses the project website and documentation assets

You are currently on the main branch of this repository. Visit the website branch to access the website files and source code.

Getting Started

Virtual Environment Setup

Create a virtual Environment and install the necessary packages from the requirements.txt file as shown:

pip install -r requirements.txt --no-cache

Note: The transformers and diffusers libraries version must match as specified in the requirements.txt. In case of error due to library mismatch, huggingface_hub==0.25.2 can also be installed.

Create Dataset

Data Preparation: Signup to access the CheXpert dataset from here. Split the dataset into 70-15-15 for the train-validation-test split. This split will remain the same for all the experiments.

Core Functionalities

Finetune Stable Diffusion

PRISM utilises the backbone of Stable Diffusion(SD) v1.5.

torchrun --nproc_per_node=4 finetune_chexpert.py

Note: the command to finetune is torchrun and not python

The finetune_chexpert.py script enables distributed training to fine-tune Stable Diffusion on chest X-ray images with associated pathology labels. The script:

  1. Creates automatic captions based on pathology findings
  2. Trains only the UNet component while freezing VAE and text encoder
  3. Supports distributed training with mixed precision
  4. Includes checkpoint saving and logging

Below are the important parameters that sets the paths:

Parameter Default Description
--model_name_or_path runwayml/stable-diffusion-v1-5 Base pretrained model to fine-tune
--train_data_path /usr/local/.../finetune.csv Path to CheXpert CSV file with pathology labels
--image_root_path /usr/local/datasets/ Root directory containing the chest X-ray images
--output_dir /usr/local/.../finetuned Directory to save the fine-tuned model and checkpoints

For fine-tuning, we use 4 A100 GPUs with 40GB each. The wall clock time to fine-tune SDv1.5 was 6 hours.

Counterfactual Image Generation

python generate_cf_images.py

The generate_cf_images.py script uses a technique to generate counterfactual versions of chest X-ray images.

Key Parameters

Parameter Description
ldm_type Type of diffusion model to use. Options: stable_diffusion_v1_4, stable_diffusion_v1_5, stable_diffusion_mimic_cxr_v0.1, finetuned_chexpert
self_replace_steps_range Controls the strength of self-attention replacement during editing. Higher values result in stronger edits but less preservation of original structure
edit_word_weight Emphasis placed on the edit word in the prompt. Higher values lead to stronger edits
clip_img_thresh Threshold for image-image similarity (higher = more similar to original)
clip_thresh Threshold for image-text similarity
clip_dir_thresh Threshold for directional similarity (measures if edit is in the right direction)
text_similarity_threshold Controls filtering of edits based on text similarity to ground truth

Classifiers

Baselines

Cyle-GAN

Other

Examples

Medical Device Editing XAI
Editing Medical Devices using PRISM XAI using PRISM

Citation

@misc{kumar2025prism,
title={PRISM: High-Resolution \& Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion},
author={Kumar, Amar and Kriz, Anita and Havaei, Mohammad and Arbel, Tal},
eprint={2503.00196},
url={https://arxiv.org/abs/2503.00196},
year={2025}
}

Acknowledgements

PRISM is built on top of several excellent repositories - LANCE, Prompt-to-prompt. For comparisons, we also use codes from the repositories - RadEdit, Imagic, Null-Text Inversion. Additionally, we leverage and borrow a few techniques from Instruct-Pix2Pix, huggingface-transformers.

About

Official implemention of the paper High-Resolution and Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

Resources

Stars

Watchers

Forks

Contributors