-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hello authors of ReflectiVA, thank you for the excellent open-source project!
After reviewing the code and paper, I am a bit confused about the re-rank part.
It seems to me that the released code of evaluation are just following the orginal ReflectiVA pipeline, where all the sections from the initial retrieved top-k entries assigned [REL] token will be used as extra input to answer the KB-VQA questions..
For the re-rank experiments, I believe that the reranker model is from the EchoSight project as indicated in the paper. However, I am not sure how you re-ranked the sections.
The score in the native EchoSight pipeline is a weighted sum consisting of two parts, a). initial retrieval similarity score(cosine similarity score of the query image and the image feature of entried in KB calculated via EVA-CLIP-8B) + b). section score assessed by the trained multi-modal reranker. In orginal settings, the weights are 1:1 . I would appreciate it if you could ellaborate more details here. I think you are only using the part b). section score as in your released code rag_evaluation/encyclopedic/release_retrieval.py there is no information to the initial retrieval similarity score?(In your implemenation only E-VQA use image-to-image retrieval).
I would appreciate it if you could help to answer this question or release the code for the ablation study.