- Python 3.7.3
- PyTorch 1.5.1
- torchvision 0.6.1
- ml_collections
- Get models in this link: ViT-B_16, ViT-B_32...
wget https://storage.googleapis.com/vit_models/imagenet21k/{MODEL_NAME}.npzIn the paper, we use data from 5 publicly available datasets:
Please download them from the official websites and put them in the corresponding folders.
Install dependencies with the following command:
pip3 install -r requirements.txtTo train CoAT on CUB-200-2011 dataset with 4 gpus in FP-16 mode for 10000 steps run:
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python3 -m torch.distributed.launch --nproc_per_node=4 train_coattention.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --data_root --pretrained_dir ViT-B_16.npz --fp16 --name sample_run > test.log &
### Acknowledgement
Many thanks to [ViT-pytorch](https://github.com/jeonsworld/ViT-pytorch) for the PyTorch reimplementation of [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)