📝 Preprint • 🤗 Hugging Face • 🧩 Github
This is the official repository for MedITok, a unified visual tokenizer tailored for medical image, supported by Shanghai Innovation Institute (SII).
MedITok encodes both low-level details and high-level semantics into a unified token space, and supports building strong generative models for a wide range of tasks including medical image synthesis and interpretation.
- Release preprint. (We notice some typos in the preprint; they will be corrected in the next version!)
- Release the initial weights.
- Release evaluation code.
- Release training code.
Set up the environment by running:
conda create -n meditok python=3.11
conda activate meditok
pip install -r requirements.txtAfter setting up the virtual environment:
- Download pretrained weights and put them in the specified folder by running:
cd meditok
hf download "massaki75/meditok" --local-dir="weights/meditok"- Open
demo.ipynband clickRun Allto run the reconstruction demo. Feel free to change the images you would like to play with. - Run
python demo.pyto save the reconstruction results.
Before training / fine-tuning the MedITok model, we need to:
- Download pretrained weights (ViTamin, BiomedClip, BiomedBERT, etc.) and fill the local paths in
./local_openclip/constants.py - Download the models used for loss calculation, create a folder named
./externaland put the models under it. - Write the metadata as a
.csvfile with columns of"identifier"(relative or absolute path of each image),"caption"(the paired caption), and"modality"(imaging modality of the image).
- Note that, we save each CT slice as an
int16PNG file to preserve the HU values, which allows for CT windowing data augmentation. Thus images tagged with"modality"=="ct"would undergo specific preprocessing (see theReadMedicalImageclass in./datasets/transforms.pyfor detail).
- Configure the variables in the training scripts (
./scripts/train_stage1.shand./scripts/train_stage2.sh). To figure out what each variable represent, please see theArgsclass in./utilities/config.py. Note that we now provide example images/metadata in./datasets/exampleand./datasets/meta, so you can directly play with the$TRAIN_DATAand$TRAIN_ROOTwritten in the example scripts.
Once we have everything prepared, we can run the scripts in ./scripts to launch the training. If you catch any bugs, feel free to open an issue/PR!
Try the following snippet:
def read_image(img, img_size=256):
if isinstance(img, str):
img = Image.open(img)
if isinstance(img, Image.Image):
img = img.convert('RGB')
if img.size[0] != img_size:
img = img.resize((img_size, img_size), Image.LANCZOS)
return img
def image_to_tensor(x):
# [H, W, C] -> [B, C, H, W]
x = torch.FloatTensor(np.array(x)).permute(2, 0, 1)
x = (x / 255.) * 2. - 1.
return x.unsqueeze(0)
def tensor_to_image(x):
# [B, C, H, W] -> [H, W, C]
x = x.clip(-1, 1).squeeze(0).permute(1, 2, 0)
x = (x + 1) * 255.0 / 2.0
x = x.numpy().astype(np.uint8)
return Image.fromarray(x)
img_path = 'assets/vis_imgs/sample1.png'
img = read_image(img_path)
x = image_to_tensor(img)
with torch.no_grad():
f = net.forward_features(x)Please see demo.ipynb or demo.py for details.
- Download the downstream models for medical image synthesis (
llamagen_meditok) and interpretation (llavamed_meditok) at our huggingface repo. - Put the model folders at
./weights. - Modify the
ROOT_DIRin the inference scripts for medical image synthesis and interpretation. - Play with example data by running:
bash evaluation/generation/scripts/sample_c2i.sh
bash evaluation/understanding/scripts/sample_vqa.sh
This project is built upon and inspired by several excellent prior works:
The model also benefits from many publicly available medical image datasets. We kindly refer readers to our preprint for details.
We sincerely thank the communities behind these works for making the resources available and inspiring further research in the field.
If you build something exciting or encounter any issues when using our model, please feel free to open an issue, submit a pull request, or contact us with feedback. Your contributions and insights are highly valued!
If you find MedITok useful for your research and applications, please kindly cite our work:
@article{ma2025meditok,
title={{MedITok}: A Unified Tokenizer for Medical Image Synthesis and Interpretation},
author={Ma, Chenglong and Ji, Yuanfeng and Ye, Jin and Li, Zilong and Wang, Chenhui and Ning, Junzhi and Li, Wei and Liu, Lihao and Guo, Qiushan and Li, Tianbin and He, Junjun and Shan, Hongming},
journal={arXiv preprint arXiv:2505.19225},
year={2025}
}
