Skip to content

Add submission metadata field for original result path#57

Open
ca16 wants to merge 2 commits intomainfrom
chloea-add-submission-metadata-field-for-original-result-path
Open

Add submission metadata field for original result path#57
ca16 wants to merge 2 commits intomainfrom
chloea-add-submission-metadata-field-for-original-result-path

Conversation

@ca16
Copy link
Copy Markdown
Collaborator

@ca16 ca16 commented Aug 13, 2025

Related to https://github.com/allenai/astabench-issues/issues/307 and https://github.com/allenai/astabench-issues/issues/199.

My understanding is that we want the results we show in the eventually public leaderboard to be based on the new config. But some (if not all) of these results will have been created using the old config when we launch. I'm making versions of those results that are compatible with the new config.

To make it easier for people to understand how we got to those results, I was thinking of adding this original_results_url field to the submission metadata, for cases where a given results file wasn't generated directly from a submission directory, but from another results file.

What I was picturing for the state of things when we release: the public leaderboard points to the results files compatible with the new config (which live in the public results repo). Those result files have pointers to the original result files (copies of which are also in the public results repo) via original_results_url, as well as the original submission entries (copes of which are also, ideally, in a gated submissions repo) via the existing submission metadata fields.

For the reviewers:

@ca16 ca16 requested review from jbragg and rodneykinney August 13, 2025 15:26
@jbragg
Copy link
Copy Markdown
Collaborator

jbragg commented Aug 13, 2025

What I was picturing for the state of things when we release: the public leaderboard points to the results files compatible with the new config (which live in the public results repo). Those result files have pointers to the original result files (copies of which are also in the public results repo) via original_results_url, as well as the original submission entries (copes of which are also, ideally, in a gated submissions repo) via the existing submission metadata fields.

IIUC, won't this proposal result in some confusion where original_results_url and logs_url both point to original versions but only one of them has the prefix original_?

@ca16
Copy link
Copy Markdown
Collaborator Author

ca16 commented Aug 13, 2025

IIUC, won't this proposal result in some confusion where original_results_url and logs_url both point to original versions but only one of them has the prefix original_?

maybe! thinking of a better name... any thoughts on your end?

@jbragg
Copy link
Copy Markdown
Collaborator

jbragg commented Aug 13, 2025

@ca16 how will re-scoring work under this plan, which will be needed periodically to normalize costs across submissions? I'm a little confused how the transformed results file that points to the original results file will get updated. Maybe this will help me understand what a good name might be

@ca16
Copy link
Copy Markdown
Collaborator Author

ca16 commented Aug 13, 2025

Are we intending that converting from results produced under one config to results that work with another config will be common place? I thought that was kind of a bandaid for right now, because we wanted a different config for what we launch compared to what we ran our evaluations with? So for this particular situation, my plan is to make new versions of everything that needs to have a new version as part of the curation ticket... I wasn't picturing there would be an automated process to make and push a new version of any result that appears under the old config....

Long term, if people need to do something similar, I imagine it would be more of a one off thing? Like they accidentally ran with the wrong config and therefore would themselves explicitly decide to create a new version of their results...

Or do you mean sometime in the future we might want to rescore the results that we have right now (that I'm going to convert as part of the curation ticket), and if that happens we want to automatically also have updated versions that work with the new config?

@jbragg
Copy link
Copy Markdown
Collaborator

jbragg commented Aug 13, 2025

Or do you mean sometime in the future we might want to rescore the results that we have right now (that I'm going to convert as part of the curation ticket), and if that happens we want to automatically also have updated versions that work with the new config?

Yes I'm talking about the current submissions, not some future set of submissions. I want to make sure that we can periodically re-score all submissions in the leaderboard, including these, and have the new costs appear

@ca16
Copy link
Copy Markdown
Collaborator Author

ca16 commented Aug 13, 2025

Or do you mean sometime in the future we might want to rescore the results that we have right now (that I'm going to convert as part of the curation ticket), and if that happens we want to automatically also have updated versions that work with the new config?

Yes I'm talking about the current submissions, not some future set of submissions. I want to make sure that we can periodically re-score all submissions in the leaderboard, including these, and have the new costs appear

Got it! My plan did not account for this. I'm going to set this requirement aside for now until there's something ready for Ruben for leaderboard screenshots, and circle back to it after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants