MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

This is the official repository for MedITok, a unified visual tokenizer tailored for medical image, supported by Shanghai Innovation Institute (SII).

MedITok encodes both low-level details and high-level semantics into a unified token space, and supports building strong generative models for a wide range of tasks including medical image synthesis and interpretation.

📌 Overview

🚧 Project Status

Release preprint. (We notice some typos in the preprint; they will be corrected in the next version!)
Release the initial weights.
Release evaluation code.
Release training code.

🔧 Environment

Set up the environment by running:

conda create -n meditok python=3.11
conda activate meditok
pip install -r requirements.txt

🎬 Demo

After setting up the virtual environment:

Download pretrained weights and put them in the specified folder by running:

cd meditok
hf download "massaki75/meditok" --local-dir="weights/meditok"

Open demo.ipynb and click Run All to run the reconstruction demo. Feel free to change the images you would like to play with.
Run python demo.py to save the reconstruction results.

🔥 Training

Before training / fine-tuning the MedITok model, we need to:

Download pretrained weights (ViTamin, BiomedClip, BiomedBERT, etc.) and fill the local paths in ./local_openclip/constants.py
Download the models used for loss calculation, create a folder named ./external and put the models under it.
Write the metadata as a .csv file with columns of "identifier" (relative or absolute path of each image), "caption" (the paired caption), and "modality" (imaging modality of the image).

Note that, we save each CT slice as an int16 PNG file to preserve the HU values, which allows for CT windowing data augmentation. Thus images tagged with "modality"=="ct" would undergo specific preprocessing (see the ReadMedicalImage class in ./datasets/transforms.py for detail).

Configure the variables in the training scripts (./scripts/train_stage1.sh and ./scripts/train_stage2.sh). To figure out what each variable represent, please see the Args class in ./utilities/config.py. Note that we now provide example images/metadata in ./datasets/example and ./datasets/meta, so you can directly play with the $TRAIN_DATA and $TRAIN_ROOT written in the example scripts.

Once we have everything prepared, we can run the scripts in ./scripts to launch the training. If you catch any bugs, feel free to open an issue/PR!

🎯 Downstream Inference

Image feature extraction

Try the following snippet:

def read_image(img, img_size=256):
    if isinstance(img, str):
        img = Image.open(img)
        
    if isinstance(img, Image.Image):
        img = img.convert('RGB')
        if img.size[0] != img_size:
            img = img.resize((img_size, img_size), Image.LANCZOS)
    return img

def image_to_tensor(x):
    # [H, W, C] -> [B, C, H, W]
    x = torch.FloatTensor(np.array(x)).permute(2, 0, 1)
    x = (x / 255.) * 2. - 1.
    return x.unsqueeze(0)

def tensor_to_image(x):
    # [B, C, H, W] -> [H, W, C]
    x = x.clip(-1, 1).squeeze(0).permute(1, 2, 0)
    x = (x + 1) * 255.0 / 2.0
    x = x.numpy().astype(np.uint8)
    return Image.fromarray(x)


img_path = 'assets/vis_imgs/sample1.png'
img = read_image(img_path)
x = image_to_tensor(img)
with torch.no_grad():
    f = net.forward_features(x)

Please see demo.ipynb or demo.py for details.

Image synthesis and interpretation

Download the downstream models for medical image synthesis (llamagen_meditok) and interpretation (llavamed_meditok) at our huggingface repo.
Put the model folders at ./weights.
Modify the ROOT_DIR in the inference scripts for medical image synthesis and interpretation.
Play with example data by running:

bash evaluation/generation/scripts/sample_c2i.sh
bash evaluation/understanding/scripts/sample_vqa.sh

🙏 Acknowledgment

This project is built upon and inspired by several excellent prior works:

The model also benefits from many publicly available medical image datasets. We kindly refer readers to our preprint for details.

We sincerely thank the communities behind these works for making the resources available and inspiring further research in the field.

🚀 Notes

If you build something exciting or encounter any issues when using our model, please feel free to open an issue, submit a pull request, or contact us with feedback. Your contributions and insights are highly valued!

📖 Citation

If you find MedITok useful for your research and applications, please kindly cite our work:

@article{ma2025meditok,
  title={{MedITok}: A Unified Tokenizer for Medical Image Synthesis and Interpretation},
  author={Ma, Chenglong and Ji, Yuanfeng and Ye, Jin and Li, Zilong and Wang, Chenhui and Ning, Junzhi and Li, Wei and Liu, Lihao and Guo, Qiushan and Li, Tianbin and He, Junjun and Shan, Hongming},
  journal={arXiv preprint arXiv:2505.19225},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

📌 Overview

🚧 Project Status

🔧 Environment

🎬 Demo

🔥 Training

🎯 Downstream Inference

Image feature extraction

Image synthesis and interpretation

🙏 Acknowledgment

🚀 Notes

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
assets		assets
datasets		datasets
evaluation		evaluation
layers		layers
local_openclip		local_openclip
models		models
scripts		scripts
trainers		trainers
utilities		utilities
weights/meditok		weights/meditok
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
batch_infer.py		batch_infer.py
demo.ipynb		demo.ipynb
demo.py		demo.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

📌 Overview

🚧 Project Status

🔧 Environment

🎬 Demo

🔥 Training

🎯 Downstream Inference

Image feature extraction

Image synthesis and interpretation

🙏 Acknowledgment

🚀 Notes

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages