Skip to content

Masaaki-75/meditok

Repository files navigation

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

📝 Preprint • 🤗 Hugging Face • 🧩 Github

This is the official repository for MedITok, a unified visual tokenizer tailored for medical image, supported by Shanghai Innovation Institute (SII).

MedITok encodes both low-level details and high-level semantics into a unified token space, and supports building strong generative models for a wide range of tasks including medical image synthesis and interpretation.

📌 Overview

🚧 Project Status

  • Release preprint. (We notice some typos in the preprint; they will be corrected in the next version!)
  • Release the initial weights.
  • Release evaluation code.
  • Release training code.

🔧 Environment

Set up the environment by running:

conda create -n meditok python=3.11
conda activate meditok
pip install -r requirements.txt

🎬 Demo

After setting up the virtual environment:

  1. Download pretrained weights and put them in the specified folder by running:
cd meditok
hf download "massaki75/meditok" --local-dir="weights/meditok"
  1. Open demo.ipynb and click Run All to run the reconstruction demo. Feel free to change the images you would like to play with.
  2. Run python demo.py to save the reconstruction results.

🔥 Training

Before training / fine-tuning the MedITok model, we need to:

  1. Download pretrained weights (ViTamin, BiomedClip, BiomedBERT, etc.) and fill the local paths in ./local_openclip/constants.py
  2. Download the models used for loss calculation, create a folder named ./external and put the models under it.
  3. Write the metadata as a .csv file with columns of "identifier" (relative or absolute path of each image), "caption" (the paired caption), and "modality" (imaging modality of the image).
  • Note that, we save each CT slice as an int16 PNG file to preserve the HU values, which allows for CT windowing data augmentation. Thus images tagged with "modality"=="ct" would undergo specific preprocessing (see the ReadMedicalImage class in ./datasets/transforms.py for detail).
  1. Configure the variables in the training scripts (./scripts/train_stage1.sh and ./scripts/train_stage2.sh). To figure out what each variable represent, please see the Args class in ./utilities/config.py. Note that we now provide example images/metadata in ./datasets/example and ./datasets/meta, so you can directly play with the $TRAIN_DATA and $TRAIN_ROOT written in the example scripts.

Once we have everything prepared, we can run the scripts in ./scripts to launch the training. If you catch any bugs, feel free to open an issue/PR!

🎯 Downstream Inference

Image feature extraction

Try the following snippet:

def read_image(img, img_size=256):
    if isinstance(img, str):
        img = Image.open(img)
        
    if isinstance(img, Image.Image):
        img = img.convert('RGB')
        if img.size[0] != img_size:
            img = img.resize((img_size, img_size), Image.LANCZOS)
    return img

def image_to_tensor(x):
    # [H, W, C] -> [B, C, H, W]
    x = torch.FloatTensor(np.array(x)).permute(2, 0, 1)
    x = (x / 255.) * 2. - 1.
    return x.unsqueeze(0)

def tensor_to_image(x):
    # [B, C, H, W] -> [H, W, C]
    x = x.clip(-1, 1).squeeze(0).permute(1, 2, 0)
    x = (x + 1) * 255.0 / 2.0
    x = x.numpy().astype(np.uint8)
    return Image.fromarray(x)


img_path = 'assets/vis_imgs/sample1.png'
img = read_image(img_path)
x = image_to_tensor(img)
with torch.no_grad():
    f = net.forward_features(x)

Please see demo.ipynb or demo.py for details.

Image synthesis and interpretation

  1. Download the downstream models for medical image synthesis (llamagen_meditok) and interpretation (llavamed_meditok) at our huggingface repo.
  2. Put the model folders at ./weights.
  3. Modify the ROOT_DIR in the inference scripts for medical image synthesis and interpretation.
  4. Play with example data by running:
bash evaluation/generation/scripts/sample_c2i.sh
bash evaluation/understanding/scripts/sample_vqa.sh

🙏 Acknowledgment

This project is built upon and inspired by several excellent prior works:

The model also benefits from many publicly available medical image datasets. We kindly refer readers to our preprint for details.

We sincerely thank the communities behind these works for making the resources available and inspiring further research in the field.

🚀 Notes

If you build something exciting or encounter any issues when using our model, please feel free to open an issue, submit a pull request, or contact us with feedback. Your contributions and insights are highly valued!

📖 Citation

If you find MedITok useful for your research and applications, please kindly cite our work:

@article{ma2025meditok,
  title={{MedITok}: A Unified Tokenizer for Medical Image Synthesis and Interpretation},
  author={Ma, Chenglong and Ji, Yuanfeng and Ye, Jin and Li, Zilong and Wang, Chenhui and Ning, Junzhi and Li, Wei and Liu, Lihao and Guo, Qiushan and Li, Tianbin and He, Junjun and Shan, Hongming},
  journal={arXiv preprint arXiv:2505.19225},
  year={2025}
}

About

Official implementation of "MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors