Skip to content

Add command to copy result from one repo to another#71

Open
ca16 wants to merge 12 commits intomainfrom
chloea-add-copy-result-command
Open

Add command to copy result from one repo to another#71
ca16 wants to merge 12 commits intomainfrom
chloea-add-copy-result-command

Conversation

@ca16
Copy link
Copy Markdown
Collaborator

@ca16 ca16 commented Aug 26, 2025

Related to https://github.com/allenai/astabench-issues/issues/465.

This PR adds a cleaned-up version of the command that I used to copy over results and their submissions from the internal repos to the public ones.

Some bits are quite similar to what's in lb publish, so I've pulled some common stuff out.

Testing done:
I commented out the uploads and added some prints, and ran both lb publish (to make sure I didn't break it) and copy:

agenteval lb publish "hf://allenai/asta-bench-internal-submissions/1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03"  "hf://allenai/asta-bench-internal-submissions/1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53"
...
Submission paths: ['1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03', '1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53']
...
Done
agenteval copy --target-submissions-repo "allenai/asta-bench-submissions" --target-results-repo "allenai/asta-bench-results" --write-public-logs-field  "hf://allenai/asta-bench-internal-results/1.0.0/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03.json"  "hf://allenai/asta-bench-internal-results/1.0.0/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53.json"
...
result paths ['1.0.0/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03.json', '1.0.0/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53.json']
...
new logs url hf://datasets/allenai/asta-bench-submissions/1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03
new logs url hf://datasets/allenai/asta-bench-submissions/1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53
...
paths to pull: ['1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03/eval_config.json', '1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03/submission.json', 'summaries/1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03/scores.json', '1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03/*.eval', '1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53/eval_config.json', '1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53/submission.json', 'summaries/1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53/scores.json', '1.0.0-dev1/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53/*.eval']
...
Done

For the reviewers:
I think if it's not clear we really want this rn, it's fine to put it off until we decide our new priorities. It'll be here later if we want to pick it back up.



@dataclass
class RepoPathsOfInterest:
Copy link
Copy Markdown
Collaborator Author

@ca16 ca16 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulls out logic used in lb publish and the new copy command.

click.echo(
"Schema in local dataset_features.yml does not match schema in hf://{repo_id}/README.md"
try:
check_submissions_against_readme(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling this out so I can use it in copy too

cli.add_command(eval_command)


@dataclass
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it easier to make sense of submission paths in particular this way... Could be useful in some other commands too...? DIdn't want to make too many changes in one go though...

default=False,
help="Provide this if the target results should have log urls in logs_url_public.",
)
def copy_command(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example command, if you wanted to copy something over from the internal repos to the public ones.

agenteval copy --target-submissions-repo "allenai/asta-bench-submissions" --target-results-repo "allenai/asta-bench-results" --write-public-logs-field  "hf://allenai/asta-bench-internal-results/1.0.0/test/aakanksha19_Asta_Table_Synthesis__GPT-4.1__2025-07-11T21-23-03.json"  "hf://allenai/asta-bench-internal-results/1.0.0/test/aakanksha19_Asta_Table_Synthesis__Pro-2.5__2025-07-12T06-06-53.json"

@ca16 ca16 requested review from mdarcy220 and rodneykinney August 26, 2025 23:46
return Features._from_yaml_list(yaml_values)


def check_submissions_against_readme(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but this feels like it belongs in the Readme class. Maybe even with a single LeaderboardSubmission parameter, so the loop would be done by the calling code.

new_logs_url = (
f"hf://datasets/{target_submissions_repo}/{src_submission_path}"
)
if write_public_logs_field:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not up to speed on logs_url vs. logs_url_public, but this certainly looks weird to me. It means the copy is not a strict copy. If we need some way of syncing the logs_url and logs_url_public, maybe that should be an explicit separate step.

Or if modifying the data during the copy is unavoidable, maybe the command should just have a different name, like migrate

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that this looks kinda weird...

For the results in the new results repo, I ended up putting their logs under logs_url_public because of the logs_url line here:

rows.append(
{
"id": sub.submit_time,
"agent_name": sub.agent_name,
"agent_description": sub.agent_description or "",
"username": sub.username or "",
"submit_time": date,
"openness": sub.openness,
"tool_usage": sub.tool_usage,
"base_models": model_names,
**flat,
"logs_url": sub.logs_url if is_internal else sub.logs_url_public,
"source_url": source_url,
}

The leaderboard looks at that log_url field to make its log link, and there's leaderboard code that determines which HF resources to use based on whether we're in 'internal' mode or not.

But I guess it might be worth figuring out if this is what was originally pictured for logs_url vs logs_url_public. @mdarcy220 do you know or should we pull in Jonathan?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about the lb-related code in general; my best guess is that it's related to the notion that we could publish a result publicly with some of our logs but have more verbose logs internally (e.g. for legal reasons). But I don't actually know; Jonathan is probably the best bet.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling @jbragg

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was built with the idea that we may post submissions to a private dataset, and then only make a subset of submissions public. However, we did not need this functionality and only have a public dataset.

Copy link
Copy Markdown
Collaborator Author

@ca16 ca16 Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we no longer need both the logs_url and logs_url_public fields in submission metadata?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would know better what we need @ca16 given my comment :)

@ca16
Copy link
Copy Markdown
Collaborator Author

ca16 commented Aug 29, 2025

Okay I think based on Jonathan's comment, we don't really need separate fields anymore, and maybe the thing to do is have just logs_url, and remove the logs_url_public field (and update code in the leaderboard accordingly). I'll open a separate ticket for that. After that I can update this to address @rodneykinney 's comment.

@ca16
Copy link
Copy Markdown
Collaborator Author

ca16 commented Aug 29, 2025

Ticket for removing logs_url_public: https://github.com/allenai/astabench-issues/issues/472 .
My understanding is that neither that work not the work in this PR are priorities though rn, so setting this aside for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants