Change eval_limit from choice to free-form string input by simonrosenberg · Pull Request #2261 · OpenHands/software-agent-sdk

simonrosenberg · 2026-03-02T18:48:25Z

Summary

Changed eval_limit workflow input from type: choice (restricted to 1/50/100/200/500) to type: string (any positive integer)
Updated documentation in .agents/skills/run-eval.md to reflect the change

Motivation

The choice dropdown is limiting when debugging — e.g., wanting to run exactly 10 instances requires picking from a fixed list. A free-form string input lets you type any number.

Test plan

Trigger run-eval.yml via workflow dispatch with a non-standard value (e.g., 10) and verify it works

🤖 Generated with Claude Code

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:7385f4d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-7385f4d-python \
  ghcr.io/openhands/agent-server:7385f4d-python

All tags pushed for this build

ghcr.io/openhands/agent-server:7385f4d-golang-amd64
ghcr.io/openhands/agent-server:7385f4d-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:7385f4d-golang-arm64
ghcr.io/openhands/agent-server:7385f4d-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:7385f4d-java-amd64
ghcr.io/openhands/agent-server:7385f4d-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:7385f4d-java-arm64
ghcr.io/openhands/agent-server:7385f4d-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:7385f4d-python-amd64
ghcr.io/openhands/agent-server:7385f4d-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:7385f4d-python-arm64
ghcr.io/openhands/agent-server:7385f4d-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:7385f4d-golang
ghcr.io/openhands/agent-server:7385f4d-java
ghcr.io/openhands/agent-server:7385f4d-python

About Multi-Architecture Support

Each variant tag (e.g., 7385f4d-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 7385f4d-python-amd64) are also available if needed

The choice dropdown only allowed specific values (1, 50, 100, 200, 500), making it annoying to debug with custom instance counts (e.g., 10). Change to a string input so any positive integer can be entered. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-02T18:48:54Z

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)


============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=190,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

github-actions · 2026-03-02T18:49:10Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

all-hands-bot

🟡 Acceptable - Solves a real problem (debugging with arbitrary limits) by removing unnecessary constraints. However, needs input validation to prevent confusing failures.

.github/workflows/run-eval.yml

Validate that eval_limit is a positive integer to fail fast with a clear error instead of causing confusing failures downstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

all-hands-bot

🟢 Good taste - Simple, pragmatic solution.

Removes unnecessary dropdown constraints while adding proper validation. The regex ^[1-9][0-9]*$ correctly rejects zero, negatives, decimals, and non-numeric strings. Solves a real debugging problem (running arbitrary instance counts) without over-engineering. LGTM! ✅

Cherry-pick from upstream e96bce3

all-hands-bot reviewed Mar 2, 2026

View reviewed changes

.github/workflows/run-eval.yml Show resolved Hide resolved

Add validation for eval_limit input

60fad3f

Validate that eval_limit is a positive integer to fail fast with a clear error instead of causing confusing failures downstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simonrosenberg requested a review from all-hands-bot March 2, 2026 19:09

all-hands-bot approved these changes Mar 2, 2026

View reviewed changes

neubig approved these changes Mar 2, 2026

View reviewed changes

neubig merged commit e96bce3 into main Mar 2, 2026
31 of 32 checks passed

neubig deleted the eval-limit-string-input branch March 2, 2026 19:46

neubig mentioned this pull request Mar 3, 2026

Add learnings from code review analysis #2280

Merged

4 tasks

zparnold added a commit to zparnold/software-agent-sdk that referenced this pull request Mar 5, 2026

Change eval_limit from choice to free-form string input (OpenHands#2261)

c2f3263

Cherry-pick from upstream e96bce3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change eval_limit from choice to free-form string input#2261

Change eval_limit from choice to free-form string input#2261
neubig merged 2 commits intomainfrom
eval-limit-string-input

simonrosenberg commented Mar 2, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

simonrosenberg commented Mar 2, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonrosenberg commented Mar 2, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading