Skip to content

Commit 1b3913a

Browse files
authored
Update README.md
1 parent c6b37dc commit 1b3913a

File tree

1 file changed

+65
-14
lines changed

1 file changed

+65
-14
lines changed

internvl_g/README.md

Lines changed: 65 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,67 @@ See [INSTALLATION.md](../INSTALLATION.md)
88

99
## 📦 Data Preparation
1010

11-
**Pre-training**
11+
Three datasets need to be prepared: COCO Caption, Flickr30K, and NoCaps.
1212

13-
Coming Soon
13+
<details>
14+
<summary>COCO Caption</summary>
1415

15-
**Fine-tuning**
16+
```bash
17+
mkdir -p data/coco && cd data/coco
18+
19+
# download coco images
20+
wget http://images.cocodataset.org/zips/train2014.zip && unzip train2014.zip
21+
wget http://images.cocodataset.org/zips/val2014.zip && unzip val2014.zip
22+
wget http://images.cocodataset.org/zips/test2015.zip && unzip test2015.zip
23+
24+
mkdir -p annotations && cd annotations/
25+
# download converted annotation files
26+
wget https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json
27+
wget https://github.com/OpenGVLab/InternVL/releases/download/data/coco_karpathy_test.json
28+
wget https://github.com/OpenGVLab/InternVL/releases/download/data/coco_karpathy_test_gt.json
29+
cd ../../../
30+
```
1631

17-
Three datasets need to be prepared: COCO Caption, Flickr30K, and NoCaps.
32+
</details>
33+
34+
<details>
35+
<summary>Flickr30K</summary>
36+
37+
```bash
38+
mkdir -p data/flickr30k && cd data/flickr30k
39+
40+
# download images from https://bryanplummer.com/Flickr30kEntities/
41+
# karpathy split annotations can be downloaded from the following link:
42+
# https://github.com/mehdidc/retrieval_annotations/releases/download/1.0.0/flickr30k_test_karpathy.txt
43+
# this file is provided by the clip-benchmark repository.
44+
# We convert this txt file to json format, download the converted file:
45+
wget https://github.com/OpenGVLab/InternVL/releases/download/data/flickr30k_cn_test.txt
46+
wget https://github.com/OpenGVLab/InternVL/releases/download/data/flickr30k_cn_train.txt
47+
wget https://github.com/OpenGVLab/InternVL/releases/download/data/flickr30k_test_karpathy.json
48+
wget https://github.com/mehdidc/retrieval_annotations/releases/download/1.0.0/flickr30k_test_karpathy.txt
49+
wget https://github.com/mehdidc/retrieval_annotations/releases/download/1.0.0/flickr30k_train_karpathy.txt
50+
wget https://github.com/mehdidc/retrieval_annotations/releases/download/1.0.0/flickr30k_val_karpathy.txt
51+
52+
cd ../..
53+
```
54+
55+
</details>
56+
57+
<details>
58+
<summary>NoCaps</summary>
59+
60+
```bash
61+
mkdir -p data/nocaps && cd data/nocaps
62+
63+
# download images from https://nocaps.org/download
64+
# original annotations can be downloaded from https://nocaps.s3.amazonaws.com/nocaps_val_4500_captions.json
65+
wget https://nocaps.s3.amazonaws.com/nocaps_val_4500_captions.json
66+
67+
cd ../..
68+
```
69+
70+
</details>
1871

19-
You can download the `coco_karpathy_train.json` from [here](https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json).
2072

2173
```shell
2274
data
@@ -38,7 +90,6 @@ data
3890
│   └── Images
3991
└── nocaps
4092
├── images
41-
├── nocaps_val_4500_captions_coco_format.json
4293
└── nocaps_val_4500_captions.json
4394
```
4495

@@ -69,22 +120,22 @@ Coming Soon
69120

70121
## 🔥 Retrieval Fine-tuning
71122

72-
To fine-tune InternVL on Flickr30K with 32 GPUs, run:
123+
To fine-tune InternVL on Flickr30K with 32 GPUs and slurm system, run:
73124

74125
```bash
75-
sh shell/finetune/internvl_stage2_finetune_flickr_364_bs1024_ep10.sh
126+
GPUS=32 sh shell/finetune/internvl_stage2_finetune_flickr_364_bs1024_ep10.sh
76127
```
77128

78-
To fine-tune InternVL on Flickr30K-CN with 32 GPUs, run:
129+
To fine-tune InternVL on Flickr30K-CN with 32 GPUs and slurm system, run:
79130

80131
```shell
81-
sh shell/finetune/internvl_stage2_finetune_flickrcn_364_bs1024_ep10.sh
132+
GPUS=32 sh shell/finetune/internvl_stage2_finetune_flickrcn_364_bs1024_ep10.sh
82133
```
83134

84-
To fine-tune InternVL on COCO with 32 GPUs, run:
135+
To fine-tune InternVL on COCO with 32 GPUs and slurm system, run:
85136

86137
```shell
87-
sh shell/finetune/internvl_stage2_finetune_coco_364_bs1024_ep5.sh
138+
GPUS=32 sh shell/finetune/internvl_stage2_finetune_coco_364_bs1024_ep5.sh
88139
```
89140

90141
## 📊 Evaluation
@@ -144,7 +195,7 @@ Expected results:
144195

145196
### Fine-tuned Image-Text Retrieval
146197

147-
#### Flickr30K
198+
#### Flickr30K fine-tuned model: [InternVL-14B-Flickr30K-FT-364px](https://huggingface.co/OpenGVLab/InternVL-14B-Flickr30K-FT-364px)
148199

149200
<table>
150201
<tr align=center>
@@ -231,7 +282,7 @@ Expected results:
231282

232283
</details>
233284

234-
#### Flickr30K-CN
285+
#### Flickr30K-CN fine-tuned model: [InternVL-14B-FlickrCN-FT-364px](https://huggingface.co/OpenGVLab/InternVL-14B-FlickrCN-FT-364px)
235286

236287
<table>
237288
<tr align=center>

0 commit comments

Comments
 (0)