Task-level limit parameters by rodneykinney · Pull Request #96 · allenai/asta-bench

rodneykinney · 2025-08-12T00:49:39Z

Modify core_bench to only download the capsules that will be used. (The --limit option is not currently forwarded to the task initializer).

To actually pass this limit to the task on the CLI, you need to use -T limit=, so add that to the demo.sh scripts.

Alternative to #93

mdarcy220

The one thing that worries me about adding these params to the suite tasks is that it introduces a new potential source of discrepancies (where two runs on supposedly the "same" task have different task params). I think it would be ideal if we used the parent tasks for the single-task demos (e.g. core_bench, super, sqa instead of core_bench_validation, super_validation, sqa_dev), so we keep the configurable stuff in the parent tasks and the suite tasks represent a fully-specified/instantiated task configuration.

mdarcy220 · 2025-08-13T00:12:48Z

solvers/react/demo.sh

 --solver astabench/solvers/react/basic_agent.py@instantiated_basic_agent \
 --model openai/gpt-4.1-nano \
 --limit 1 \
+-T limit=1 \


Last I checked, inspect eval-set (and by extension astabench eval) cannot take a -T param (this was the original the reason for having the separate _validation and _test tasks)

In my testing, the -T parameter was passed through to the task initializer

Another reason we defined separate val/test tasks was that it wasn't possible to pass through different task params to different tasks

rodneykinney marked this pull request as draft August 12, 2025 00:49

rodneykinney force-pushed the sample-limits branch 2 times, most recently from 023f509 to b892629 Compare August 12, 2025 22:25

Support --limit option for all tasks

63ea730

rodneykinney force-pushed the sample-limits branch from b892629 to 63ea730 Compare August 12, 2025 22:27

Merge branch 'main' of github.com:allenai/asta-bench into sample-limits

655b90b

rodneykinney marked this pull request as ready for review August 12, 2025 22:45

rodneykinney requested a review from mdarcy220 August 12, 2025 22:46

mdarcy220 reviewed Aug 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task-level limit parameters#96

Task-level limit parameters#96
rodneykinney wants to merge 2 commits intomainfrom
sample-limits

rodneykinney commented Aug 12, 2025 •

edited

Loading

Uh oh!

mdarcy220 left a comment

Uh oh!

mdarcy220 Aug 13, 2025

Uh oh!

rodneykinney Aug 13, 2025

Uh oh!

jbragg Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rodneykinney commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdarcy220 left a comment

Choose a reason for hiding this comment

Uh oh!

mdarcy220 Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

rodneykinney Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

jbragg Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rodneykinney commented Aug 12, 2025 •

edited

Loading