Hi, thanks for releasing this great work!
I have a question about the VQA experiments. In the paper, it is mentioned:
"The model is fine-tuned for three epochs on each of the three VQA datasets and evaluated accordingly."
I would like to clarify:
- Does this 3-epoch fine-tuning apply only to LLaVA-Tri, or did you also fine-tune the baseline models (e.g., LLaVA-Med, BLIP-2, PMC-VQA, etc.) for three epochs on the same datasets?
- Or are the baseline results directly taken from the original papers without additional fine-tuning?
This clarification would really help me better understand the fairness of the comparison and how to reproduce the results.
Thanks a lot!