Skip to content

Commit e96bce3

Browse files
Change eval_limit from choice to free-form string input (#2261)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c31cdee commit e96bce3

File tree

2 files changed

+11
-9
lines changed

2 files changed

+11
-9
lines changed

.agents/skills/run-eval.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ curl -X POST \
3232

3333
**Key parameters:**
3434
- `benchmark`: `swebench`, `swebenchmultimodal`, `gaia`, `swtbench`, `commit0`, `multiswebench`
35-
- `eval_limit`: `1`, `50`, `100`, `200`, `500`
35+
- `eval_limit`: Any positive integer (e.g., `1`, `10`, `50`, `200`)
3636
- `model_ids`: See `.github/run-eval/resolve_model_config.py` for available models
3737
- `benchmarks_branch`: Use feature branch from the benchmarks repo to test benchmark changes before merging
3838

.github/workflows/run-eval.yml

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,10 @@ on:
3232
default: false
3333
type: boolean
3434
eval_limit:
35-
description: Number of instances to run
35+
description: Number of instances to run (any positive integer)
3636
required: false
3737
default: '1'
38-
type: choice
39-
options:
40-
- '1'
41-
- '100'
42-
- '50'
43-
- '200'
44-
- '500'
38+
type: string
4539
model_ids:
4640
description: Comma-separated model IDs to evaluate. Must be keys of MODELS in resolve_model_config.py. Defaults to first model in that
4741
dict.
@@ -138,6 +132,14 @@ jobs:
138132
with:
139133
python-version: '3.13'
140134

135+
- name: Validate eval_limit
136+
if: github.event_name == 'workflow_dispatch'
137+
run: |
138+
if ! [[ "${{ github.event.inputs.eval_limit }}" =~ ^[1-9][0-9]*$ ]]; then
139+
echo "Error: eval_limit must be a positive integer, got: ${{ github.event.inputs.eval_limit }}"
140+
exit 1
141+
fi
142+
141143
- name: Validate SDK reference (semantic version check)
142144
if: github.event_name == 'workflow_dispatch'
143145
env:

0 commit comments

Comments
 (0)