-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Open
Description
Question
Hello,
I tried to reproduce the reported results of LLaVA-v1.5-7B on the MM-VET dataset, but the performance I obtained was much lower.
-
I loaded the official checkpoint and ran the script
scripts/v1_5/eval/mmvet.sh. Then I submitted the generated answers to the official evaluation platform at Huggingface, but only got an accuracy of 27.8. -
I also directly uploaded the provided
llava-v1.5-7b.jsonfile from the officialeval.zippackage (undermm-vet/results/), but the total score was still only 27.9.
This is much lower than the value of 31.1 reported in the paper.
Here I also provide the full evaluation results I obtained:
| rec | ocr | know | gen | spat | math | total | std | runs | |
|---|---|---|---|---|---|---|---|---|---|
| llava-v1.5-7b | 33.2 | 17.8 | 14.4 | 15.4 | 22.1 | 7.7 | 27.9 | 0.0 | [np.float64(27.9) |
Could you please let me know where the problem might be?
Thanks in advance for the clarification!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels