Skip to content

Task-level limit parameters#96

Open
rodneykinney wants to merge 2 commits intomainfrom
sample-limits
Open

Task-level limit parameters#96
rodneykinney wants to merge 2 commits intomainfrom
sample-limits

Conversation

@rodneykinney
Copy link
Copy Markdown
Member

@rodneykinney rodneykinney commented Aug 12, 2025

Modify core_bench to only download the capsules that will be used. (The --limit option is not currently forwarded to the task initializer).

To actually pass this limit to the task on the CLI, you need to use -T limit=, so add that to the demo.sh scripts.

Alternative to #93

@rodneykinney rodneykinney marked this pull request as draft August 12, 2025 00:49
@rodneykinney rodneykinney force-pushed the sample-limits branch 2 times, most recently from 023f509 to b892629 Compare August 12, 2025 22:25
@rodneykinney rodneykinney marked this pull request as ready for review August 12, 2025 22:45
@rodneykinney rodneykinney requested a review from mdarcy220 August 12, 2025 22:46
Copy link
Copy Markdown
Contributor

@mdarcy220 mdarcy220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing that worries me about adding these params to the suite tasks is that it introduces a new potential source of discrepancies (where two runs on supposedly the "same" task have different task params). I think it would be ideal if we used the parent tasks for the single-task demos (e.g. core_bench, super, sqa instead of core_bench_validation, super_validation, sqa_dev), so we keep the configurable stuff in the parent tasks and the suite tasks represent a fully-specified/instantiated task configuration.

--solver astabench/solvers/react/basic_agent.py@instantiated_basic_agent \
--model openai/gpt-4.1-nano \
--limit 1 \
-T limit=1 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last I checked, inspect eval-set (and by extension astabench eval) cannot take a -T param (this was the original the reason for having the separate _validation and _test tasks)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing, the -T parameter was passed through to the task initializer

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason we defined separate val/test tasks was that it wasn't possible to pass through different task params to different tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants