Skip to content

Commit 84359e3

Browse files
Expose terminalbench in run-eval workflow (#2360)
Co-authored-by: openhands <openhands@all-hands.dev>
1 parent 26db836 commit 84359e3

File tree

2 files changed

+2
-1
lines changed

2 files changed

+2
-1
lines changed

.agents/skills/run-eval.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ curl -X POST \
3131
```
3232

3333
**Key parameters:**
34-
- `benchmark`: `swebench`, `swebenchmultimodal`, `gaia`, `swtbench`, `commit0`, `multiswebench`
34+
- `benchmark`: `swebench`, `swebenchmultimodal`, `gaia`, `swtbench`, `commit0`, `multiswebench`, `terminalbench`
3535
- `eval_limit`: Any positive integer (e.g., `1`, `10`, `50`, `200`)
3636
- `model_ids`: See `.github/run-eval/resolve_model_config.py` for available models
3737
- `benchmarks_branch`: Use feature branch from the benchmarks repo to test benchmark changes before merging

.github/workflows/run-eval.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ on:
2020
- swtbench
2121
- commit0
2222
- swebenchmultimodal
23+
- terminalbench
2324
sdk_ref:
2425
description: SDK commit/ref to evaluate (must be a semantic version like v1.0.0 unless 'Allow unreleased branches' is checked)
2526
required: true

0 commit comments

Comments
 (0)