-
Notifications
You must be signed in to change notification settings - Fork 29
Pull requests: OpenHands/benchmarks
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat(swebenchmultimodal): Add visual verification instructions and image URL validation
#359
opened Jan 23, 2026 by
neubig
Loading…
Add SDK commit hash to workflow run titles for index benchmarks
#351
opened Jan 21, 2026 by
juanmichelini
Loading…
Rename build-gaia-image.yml to build-gaia-images.yml
#347
opened Jan 21, 2026 by
simonrosenberg
Loading…
Change commit0 metric from resolved instances to total passed tests
#341
opened Jan 18, 2026 by
juanmichelini
•
Draft
build(deps): bump the version-all group across 1 directory with 15 updates
dependencies
Pull requests that update a dependency file
python:uv
Pull requests that update python:uv code
#336
opened Jan 17, 2026 by
dependabot
bot
Loading…
fix(swebench-multimodal): create output.report.json for consistency
#331
opened Jan 16, 2026 by
juanmichelini
Loading…
BREAKING: Rename --max-attempts to --n-critic-runs
#325
opened Jan 16, 2026 by
juanmichelini
•
Draft
Fix dataset loading schema validation issue in CI
#304
opened Jan 13, 2026 by
juanmichelini
Loading…
feat: Support LLM_API_KEY environment variable override for benchmark configs
#302
opened Jan 12, 2026 by
simonrosenberg
Loading…
build(deps): bump actions/github-script from 7 to 8 in the version-all group
dependencies
Pull requests that update a dependency file
github_actions
Pull requests that update GitHub Actions code
#292
opened Jan 12, 2026 by
dependabot
bot
Loading…
Add configurable conversation timeout to all benchmarks
#250
opened Jan 5, 2026 by
simonrosenberg
•
Draft
Add add_resolve_rate_to_predictions function to output_utils
#199
opened Dec 23, 2025 by
juanmichelini
•
Draft
API-based Critic implementation
build-swebench-200
Build 200 SWE-Bench Verified Image based on SDK version on this PR.
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.