[Discussion]Add dedicated display for RTEB benchmark results #3096
Replies: 4 comments 10 replies
-
How do you select models to RTEB?
I don't think that you can hide it in
How different is your logic? |
Beta Was this translation helpful? Give feedback.
-
Why not just show all models that are evaluated on the benchmark? zero-shot filter So, the argument would be that closed datasets become a better proxy for generalization? That way, zero-shot is not a reasonable annotation. I would be interested to see if this is the case and when it is not. In general though, I think it is a reasonable change given the closed datasets ranking Mean has some issues, as noted in the MMTEB paper; I would suspect that Borda would generally provide a better ranking. What is the reason for changing it? |
Beta Was this translation helpful? Give feedback.
-
As this is a discussion, I think it would be better in discussions |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, I’d like to summarize the current outcomes of our discussion:
Please confirm if this matches our consensus. I will proceed to update the PR accordingly based on these decisions in the coming days. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Description of the feature
We need to display RTEB results in the interface.
To support this, we are opening this issue to discuss the related changes before finalizing the implementation.
A demo of the current implementation can be found here:
https://huggingface.co/spaces/SmileXing/leaderboard
There are currently three points that need discussion:
Data filtering
When the benchmark is RTEB, only RTEB model results are displayed.
Question: How should we determine which models belong to RTEB and should be shown?
Field adjustments
When the benchmark is RTEB, the zero-shot field is hidden (including the zero-shot option in advanced model filters).
Question: What do you think about this adjustment?
Model ranking
When the benchmark is RTEB, the ranking logic is changed to use the RTEB-specific algorithm.
Question: What do you think about this change?
Beta Was this translation helpful? Give feedback.
All reactions