@@ -161,19 +161,19 @@ CUDA_VISIBLE_DEVICES=0,1 sh shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b
161161
162162| name | model size | MathVista<br >(testmini) | MMB<br >(dev/test) | MMB−CN<br >(dev/test) | MMMU<br >(val/test) | CMMMU<br >(val/test) | MMVP | MME | POPE | Tiny LVLM | SEEDv1<br >(image) | LLaVA Wild | MM−Vet |
163163| ------------------------------------------------------------------------------------------- | ---------- | ----------------------- | ----------------- | -------------------- | ---------------------------------------------------------------------------------- | ------------------- | ---- | -------------- | ---- | --------- | ----------------- | ---------- | ------ |
164- | [ InternVL−Chat−V1.1] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-1 ) | 19B | 34.5 | 76.7 / 75.4 | 71.9 / 70.3 | 39.1 / 35.3 | 34.8 / 34.0 | 44.7 | 1675.1 / 348.6 | 87.1 | 343.2 | 73.2 | 73.2 | 46.7 |
165- | [ InternVL−Chat−V1.2] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-2 ) | 40B | 47.7 | 81.4 / 82.2 | 79.5 / 81.2 | 51.6 / [ 46.2] ( https://eval.ai/web/challenges/challenge-page/2179/leaderboard/5377 ) | TODO | 56.7 | 1672.1 / 509.3 | 88.0 | 350.3 | 75.6 | 85.0 | 48.9 |
166- | [ InternVL−Chat−V1.2−Plus] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-2-Plus ) | 40B | 59.9 | 83.4 / 83.8 | 81.6 / 82.0 | 50.3 / 45.6 | TODO | 58.7 | 1623.6 / 550.7 | 88.7 | 353.9 | 76.4 | 84.6 | 47.9 |
164+ | [ InternVL−Chat−V1.1] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-1 ) | 19B | 34.5 | 76.7& nbsp ; / & nbsp ; 75.4 | 71.9& nbsp ; / & nbsp ; 70.3 | 39.1& nbsp ; / & nbsp ; 35.3 | 34.8& nbsp ; / & nbsp ; 34.0 | 44.7 | 1675.1& nbsp ; / & nbsp ; 348.6 | 87.1 | 343.2 | 73.2 | 73.2 | 46.7 |
165+ | [ InternVL−Chat−V1.2] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-2 ) | 40B | 47.7 | 81.4& nbsp ; / & nbsp ; 82.2 | 79.5& nbsp ; / & nbsp ; 81.2 | 51.6& nbsp ; / & nbsp ; [ 46.2] ( https://eval.ai/web/challenges/challenge-page/2179/leaderboard/5377 ) | TODO | 56.7 | 1672.1& nbsp ; / & nbsp ; 509.3 | 88.0 | 350.3 | 75.6 | 85.0 | 48.9 |
166+ | [ InternVL−Chat−V1.2−Plus] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-2-Plus ) | 40B | 59.9 | 83.4& nbsp ; / & nbsp ; 83.8 | 81.6& nbsp ; / & nbsp ; 82.0 | 50.3& nbsp ; / & nbsp ; 45.6 | TODO | 58.7 | 1623.6& nbsp ; / & nbsp ; 550.7 | 88.7 | 353.9 | 76.4 | 84.6 | 47.9 |
167167
168168** Image Captioning & Visual Question Answering**
169169
170170\* Training set observed.
171171
172172| name | model size | COCO<br >(test) | Flickr30K<br >(test) | NoCaps<br >(val) | VQAv2<br >(testdev) | OKVQA<br >(val) | TextVQA<br >(val) | VizWiz<br >(val/test) | AI2D<br >(test) | GQA<br >(test) | ScienceQA<br >(image) |
173173| ------------------------------------------------------------------------------------------- | ---------- | -------------- | ------------------- | --------------- | ------------------ | -------------- | ---------------- | -------------------- | -------------- | ------------- | -------------------- |
174- | [ InternVL−Chat−V1.1] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-1 ) | 19B | 142.2\* | 85.3 | 120.8 | 80.9\* | 64.1\* | 65.9 | 59.0 / 57.3 | 72.2\* | 62.5\* | 90.1\* |
175- | [ InternVL−Chat−V1.2] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-2 ) | 40B | 113.9 | 92.4 | 112.5 | - | 62.5\* | 69.7 | 61.9 / 60.0 | 77.1\* | 64.0\* | 83.3 |
176- | [ InternVL−Chat−V1.2−Plus] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-2-Plus ) | 40B | 143.4\* | 90.5 | 125.8 | - | 67.6\* | 71.3\* | 61.3 / - | 78.2\* | 66.9\* | 98.1\* |
174+ | [ InternVL−Chat−V1.1] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-1 ) | 19B | 142.2\* | 85.3 | 120.8 | 80.9\* | 64.1\* | 65.9 | 59.0& nbsp ; / & nbsp ; 57.3 | 72.2\* | 62.5\* | 90.1\* |
175+ | [ InternVL−Chat−V1.2] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-2 ) | 40B | 113.9 | 92.4 | 112.5 | - | 62.5\* | 69.7 | 61.9& nbsp ; / & nbsp ; 60.0 | 77.1\* | 64.0\* | 83.3 |
176+ | [ InternVL−Chat−V1.2−Plus] ( https://huggingface.co/OpenGVLab/InternVL-Chat-Chinese-V1-2-Plus ) | 40B | 143.4\* | 90.5 | 125.8 | - | 67.6\* | 71.3\* | 61.3& nbsp ; / & nbsp ; 59.5 | 78.2\* | 66.9\* | 98.1\* |
177177
178178- We found that incorrect images were used for training and testing in ` AI2D ` , meaning that for problems where ` abcLabel ` is True, ` abc_images ` were not utilized. We have now corrected the images used for testing, but the results may still be somewhat lower as a consequence.
179179
@@ -189,9 +189,9 @@ CUDA_VISIBLE_DEVICES=0,1 sh shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b
189189
190190| model | QLLaMA | LLM | res | COCO | Flickr | NoCaps | VQAv2 | GQA | VizWiz | TextVQA | MME | POPE | Download |
191191| ------------- | ------ | ------------ | --- | ----- | ------ | ------ | ----- | ---- | ------ | ------- | ------ | ---- | -------- |
192- | InternVL- Chat | ✔️ | frozen V- 7B | 224 | 141.4 | 89.7 | 120.5 | 72.3 | 57.7 | 44.5 | 42.1 | 1298.5 | 85.2 | TODO |
193- | InternVL- Chat | ✔️ | frozen V- 13B | 224 | 142.4 | 89.9 | 123.1 | 71.7 | 59.5 | 54.0 | 49.1 | 1317.2 | 85.4 | TODO |
194- | InternVL- Chat | ✔️ | V- 13B | 336 | 146.2 | 92.2 | 126.2 | 81.2 | 66.6 | 58.5 | 61.5 | 1586.4 | 87.6 | TODO |
192+ | InternVL− Chat | ✔️ | frozen& nbsp ; V− 7B | 224 | 141.4 | 89.7 | 120.5 | 72.3 | 57.7 | 44.5 | 42.1 | 1298.5 | 85.2 | TODO |
193+ | InternVL− Chat | ✔️ | frozen& nbsp ; V− 13B | 224 | 142.4 | 89.9 | 123.1 | 71.7 | 59.5 | 54.0 | 49.1 | 1317.2 | 85.4 | TODO |
194+ | InternVL− Chat | ✔️ | V− 13B | 336 | 146.2 | 92.2 | 126.2 | 81.2 | 66.6 | 58.5 | 61.5 | 1586.4 | 87.6 | TODO |
195195
196196## ❓ How to Evaluate
197197
0 commit comments