Skip to content

PRIS-CV/CoAT

Repository files navigation

CoAT:Co-Attention Based Transformer for Fine-Grained Visual Classification

Dependencies:

  • Python 3.7.3
  • PyTorch 1.5.1
  • torchvision 0.6.1
  • ml_collections

Usage

1. Download Google pre-trained ViT models

wget https://storage.googleapis.com/vit_models/imagenet21k/{MODEL_NAME}.npz

2. Prepare data

In the paper, we use data from 5 publicly available datasets:

Please download them from the official websites and put them in the corresponding folders.

3. Install required packages

Install dependencies with the following command:

pip3 install -r requirements.txt

4. Train

To train CoAT on CUB-200-2011 dataset with 4 gpus in FP-16 mode for 10000 steps run:

CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python3 -m torch.distributed.launch --nproc_per_node=4 train_coattention.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --data_root  --pretrained_dir ViT-B_16.npz --fp16 --name sample_run  > test.log &

### Acknowledgement

Many thanks to [ViT-pytorch](https://github.com/jeonsworld/ViT-pytorch) for the PyTorch reimplementation of [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors