Skip to content

Commit 5aa95de

Browse files
committed
Update README.md
1 parent ba943f5 commit 5aa95de

File tree

2 files changed

+17
-16
lines changed

2 files changed

+17
-16
lines changed

README.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -332,22 +332,23 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
332332

333333
- Multimodal Benchmarks with Frozen LLM [\[see details\]](./internvl_chat#-evaluation)
334334

335-
| method | visual encoder | glue layer | LLM | res. | COCO | Flickr | NoCaps | VQAv2 | GQA | VizWiz | TextVQA | MME | POPE |
336-
| -------------------- | :------------: | :--------: | :--------: | :--: | :---: | :----: | :----: | :---: | :--: | :----: | :-----: | :----: | :--: |
337-
| InstructBLIP | EVA-g | QFormer | Vicuna-7B | 224 || 82.4 | 123.1 || 49.2 | 34.5 | 50.1 |||
338-
| BLIP-2 | EVA-g | QFormer | Vicuna-13B | 224 || 71.6 | 103.9 | 41.0 | 41.0 | 19.6 | 42.5 | 1293.8 | 85.3 |
339-
| InstructBLIP | EVA-g | QFormer | Vicuna-13B | 224 || 82.8 | 121.9 || 49.5 | 33.4 | 50.7 | 1212.8 | 78.9 |
340-
| InternVL-Chat (ours) | IViT-6B | QLLaMA | Vicuna-7B | 224 | 141.4 | 89.7 | 120.5 | 72.3 | 57.7 | 44.5 | 42.1 | 1298.5 | 85.2 |
341-
| InternVL-Chat (ours) | IViT-6B | QLLaMA | Vicuna-13B | 224 | 142.4 | 89.9 | 123.1 | 71.7 | 59.5 | 54.0 | 49.1 | 1317.2 | 85.4 |
335+
| method | visual encoder | glue layer | LLM | res. | COCO | Flickr | NoCaps | VQAv2 | GQA | VizWiz | TextVQA | MME | POPE |
336+
| -------------------- | :------------: | :--------: | :---: | :--: | :---: | :----: | :----: | :---: | :--: | :----: | :-----: | :----: | :--: |
337+
| InstructBLIP | EVA-g | QFormer | V-7B | 224 || 82.4 | 123.1 || 49.2 | 34.5 | 50.1 |||
338+
| BLIP-2 | EVA-g | QFormer | V-13B | 224 || 71.6 | 103.9 | 41.0 | 41.0 | 19.6 | 42.5 | 1293.8 | 85.3 |
339+
| InstructBLIP | EVA-g | QFormer | V-13B | 224 || 82.8 | 121.9 || 49.5 | 33.4 | 50.7 | 1212.8 | 78.9 |
340+
| InternVL-Chat (ours) | IViT-6B | QLLaMA | V-7B | 224 | 141.4 | 89.7 | 120.5 | 72.3 | 57.7 | 44.5 | 42.1 | 1298.5 | 85.2 |
341+
| InternVL-Chat (ours) | IViT-6B | QLLaMA | V-13B | 224 | 142.4 | 89.9 | 123.1 | 71.7 | 59.5 | 54.0 | 49.1 | 1317.2 | 85.4 |
342342

343343
- Multimodal Benchmarks with Trainable LLM [\[see details\]](./internvl_chat_llava)
344344

345-
| method | visual encoder | glue layer | LLM | res. | VQAv2 | GQA | VizWiz | TextVQA | MME | POPE |
346-
| -------------------- | :------------: | :--------: | :--------: | :--: | :---: | :--: | :----: | :-----: | :----: | :--: |
347-
| LLaVA-1.5 | CLIP-L-336 | MLP | Vicuna-7B | 336 | 78.5 | 62.0 | 50.0 | 58.2 | 1510.7 | 85.9 |
348-
| InternVL-Chat (ours) | IViT-6B | MLP | Vicuna-7B | 336 | 79.3 | 62.9 | 52.5 | 57.0 | 1525.1 | 86.4 |
349-
| LLaVA-1.5 | CLIP-L-336 | MLP | Vicuna-13B | 336 | 80.0 | 63.3 | 53.6 | 61.3 | 1531.3 | 85.9 |
350-
| InternVL-Chat (ours) | IViT-6B | MLP | Vicuna-13B | 336 | 80.2 | 63.9 | 54.6 | 58.7 | 1546.9 | 87.1 |
345+
| method | vision encoder | LLM | res. | VQAv2 | GQA | VizWiz | SQA | TextVQA | POPE | MME | MMB | MMB<sub>CN</sub> | MMVet |
346+
| -------------------- | :------------: | :---: | :--: | :---: | :--: | :----: | :--: | :-----: | :--: | :----: | :--: | :--------------: | :---: |
347+
| LLaVA-1.5 | CLIP-L-336px | V-7B | 336 | 78.5 | 62.0 | 50.0 | 66.8 | 58.2 | 85.9 | 1510.7 | 64.3 | 58.3 | 30.5 |
348+
| LLaVA-1.5 | CLIP-L-336px | V-13B | 336 | 80.0 | 63.3 | 53.6 | 71.6 | 61.3 | 85.9 | 1531.3 | 67.7 | 63.6 | 35.4 |
349+
| InternVL-Chat (ours) | IViT-6B-224px | V-7B | 336 | 79.3 | 62.9 | 52.5 | 66.2 | 57.0 | 86.4 | 1525.1 | 64.6 | 57.6 | 31.2 |
350+
| InternVL-Chat (ours) | IViT-6B-224px | V-13B | 336 | 80.2 | 63.9 | 54.6 | 70.1 | 58.7 | 87.1 | 1546.9 | 66.5 | 61.9 | 33.7 |
351+
| InternVL-Chat (ours) | IViT-6B-448px | V-13B | 448 | 82.0 | 64.1 | 60.1 | 71.6 | 64.8 | 87.2 | 1579.0 | 68.2 | 64.0 | 36.7 |
351352

352353
- Tiny LVLM [\[see details\]](https://github.com/OpenGVLab/Multi-Modality-Arena/tree/main/tiny_lvlm_evaluation)
353354

internvl_chat_llava/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts_internvl/finetune_internvit6b_44
126126
## 🤗 Model Zoo
127127

128128
| method | vision encoder | LLM | res. | VQAv2 | GQA | VizWiz | SQA | TextVQA | POPE | MME | MMB | MMB<sub>CN</sub> | MMVet | Download |
129-
| ------------- |:--------------:|:-----:|:----:|:-----:|:----:|:------:|:-----:|:-------:|:----:|:------:|:-----:|:------:|:------:| :----------------------------------------------------------------:|
130-
| LLaVA-1.5 | CLIP-L-336px | V-7B | 336 | 78.5 | 62.0 | 50.0 | 66.8 | 58.2 | 85.9 | 1510.7 | 64.3 | 58.3 | 30.5 | - |
131-
| LLaVA-1.5 | CLIP-L-336px | V-13B | 336 | 80.0 | 63.3 | 53.6 | 71.6 | 61.3 | 85.9 | 1531.3 | 67.7 | 63.6 | 35.4 | - |
129+
| ------------- |:--------------:|:-----:|:----:|:-----:|:----:|:------:|:-----:|:-------:|:----:|:------:|:-----:|:------:|:------:| :-----------------------------------------------------------------------------------:|
130+
| LLaVA-1.5 | CLIP-L-336px | V-7B | 336 | 78.5 | 62.0 | 50.0 | 66.8 | 58.2 | 85.9 | 1510.7 | 64.3 | 58.3 | 30.5 | 🤗 [HF link](https://huggingface.co/liuhaotian/llava-v1.5-7b) |
131+
| LLaVA-1.5 | CLIP-L-336px | V-13B | 336 | 80.0 | 63.3 | 53.6 | 71.6 | 61.3 | 85.9 | 1531.3 | 67.7 | 63.6 | 35.4 | 🤗 [HF link](https://huggingface.co/liuhaotian/llava-v1.5-13b) |
132132
| InternVL-Chat | IViT-6B-224px | V-7B | 336 | 79.3 | 62.9 | 52.5 | 66.2 | 57.0 | 86.4 | 1525.1 | 64.6 | 57.6 | 31.2 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B) |
133133
| InternVL-Chat | IViT-6B-224px | V-13B | 336 | 80.2 | 63.9 | 54.6 | 70.1 | 58.7 | 87.1 | 1546.9 | 66.5 | 61.9 | 33.7 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-13B) |
134134
| InternVL-Chat | IViT-6B-448px | V-13B | 448 | 82.0 | 64.1 | 60.1 | 71.6 | 64.8 | 87.2 | 1579.0 | 68.2 | 64.0 | 36.7 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-13B-448px) |

0 commit comments

Comments
 (0)