Skip to content

Latest commit

 

History

History
37 lines (31 loc) · 1.38 KB

File metadata and controls

37 lines (31 loc) · 1.38 KB

Requirements

Software Requirements

  • CUDA >= 12.3

Python Requirements

  • Please see apptainer/config.def

Preparations

Post-pre-training Dataset: COCO Caption (2017)

  1. Download the dataset from here
  2. Install the dataset into ./dataset/coco/

Evaluation Dataset: ImageNet

  1. Download the dataset from here
  2. Install the dataset into ./dataset/imagenet/

Example

Run Post-pre-training of CLIP-Refine on COCO Caption

python3 main/train.py --config_path config/01_post-pre-training/clip-refine.yaml

Evaluate Zero-shot Performance on ImageNet

python3 main/test.py --config_path config/01_post-pre-training/clip-refine.yaml

Citation

@inproceedings{Yamaguchi_CVPR25_CLIP-Refine,
  title={Post-pre-training for Modality Alignment in Vision-Language Foundation Models},
  author={Yamaguchi, Shin'ya and Feng, Dewei and Kanai, Sekitoshi and Adachi, Kazuki and Chijiwa, Daiki},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}