You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
I'm having some issues to understand the behavior of the Multimodal Contextual Precision Metric, as the documentation states that to achieve a high score relevant statements (or nodes) should be ranked higher than irrelevant ones.
In my case, I evaluated the retrieval part of a question (I'm building a multimodal RAG) and obtained a score of 1.0 (the maximum). I don't think that is right, since the relevant part is obtained in the 3rd node, as it is the one that contains the actual answer.
However, after checking how that metric works internally, I noticed that first the model generates a list of verdicts to determine whether each node is relevant. When generating these verdicts, Node 3 is magically positioned at the top of the list, even if it wasn't the first retrieved node, as you can see in the image. (Of course it is predicted as relevant)
As I mentioned the score was a perfect 1.0 and this is the provided reason.
So I have 2 questions overall.
Does this metric have any type of reranker internally that positions the relevant nodes up?
Even if the relevant node is positioned at the top, I have retrieved 14 nodes that are irrelevant and noisy (could distract the generator). Does that not have a penalty or impact on the final precision score?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I'm having some issues to understand the behavior of the Multimodal Contextual Precision Metric, as the documentation states that to achieve a high score relevant statements (or nodes) should be ranked higher than irrelevant ones.
In my case, I evaluated the retrieval part of a question (I'm building a multimodal RAG) and obtained a score of 1.0 (the maximum). I don't think that is right, since the relevant part is obtained in the 3rd node, as it is the one that contains the actual answer.
However, after checking how that metric works internally, I noticed that first the model generates a list of verdicts to determine whether each node is relevant. When generating these verdicts, Node 3 is magically positioned at the top of the list, even if it wasn't the first retrieved node, as you can see in the image. (Of course it is predicted as relevant)

As I mentioned the score was a perfect 1.0 and this is the provided reason.

So I have 2 questions overall.
Any help is welcome!
Thanks
Beta Was this translation helpful? Give feedback.
All reactions