Skip to content

Any point in this benchmark anymore? #124

@EdanToledo

Description

@EdanToledo

Hey, I had a quick question regarding the existence of this benchmark? If we are just adding methods to the leaderboard where people climb on the test set, what is the point of any of this? I am just wondering if the maintainers made a mistake with this? or if they care? I think if they care, it would be wise to at least add a disclaimer that some of the results on the leaderboard are from agents that hillclimb the test set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions