-
Notifications
You must be signed in to change notification settings - Fork 226
Open
Description
Hey, I had a quick question regarding the existence of this benchmark? If we are just adding methods to the leaderboard where people climb on the test set, what is the point of any of this? I am just wondering if the maintainers made a mistake with this? or if they care? I think if they care, it would be wise to at least add a disclaimer that some of the results on the leaderboard are from agents that hillclimb the test set.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels