Expose terminalbench in run-eval workflow (#2360)

neubig · openhands-agent · web-flow · commit 84359e335386 · 2026-03-20T06:58:31.000+09:00
Co-authored-by: openhands &lt;openhands@all-hands.dev&gt;
diff --git a/.agents/skills/run-eval.md b/.agents/skills/run-eval.md
@@ -31,7 +31,7 @@ curl -X POST \
 ```
 
 **Key parameters:**
-- `benchmark`: `swebench`, `swebenchmultimodal`, `gaia`, `swtbench`, `commit0`, `multiswebench`
+- `benchmark`: `swebench`, `swebenchmultimodal`, `gaia`, `swtbench`, `commit0`, `multiswebench`, `terminalbench`
 - `eval_limit`: Any positive integer (e.g., `1`, `10`, `50`, `200`)
 - `model_ids`: See `.github/run-eval/resolve_model_config.py` for available models
 - `benchmarks_branch`: Use feature branch from the benchmarks repo to test benchmark changes before merging
diff --git a/.github/workflows/run-eval.yml b/.github/workflows/run-eval.yml
@@ -20,6 +20,7 @@ on:
                     - swtbench
                     - commit0
                     - swebenchmultimodal
+                    - terminalbench
             sdk_ref:
                 description: SDK commit/ref to evaluate (must be a semantic version like v1.0.0 unless 'Allow unreleased branches' is checked)
                 required: true