Skip to content

Change eval_limit from choice to free-form string input#2261

Merged
neubig merged 2 commits intomainfrom
eval-limit-string-input
Mar 2, 2026
Merged

Change eval_limit from choice to free-form string input#2261
neubig merged 2 commits intomainfrom
eval-limit-string-input

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented Mar 2, 2026

Summary

  • Changed eval_limit workflow input from type: choice (restricted to 1/50/100/200/500) to type: string (any positive integer)
  • Updated documentation in .agents/skills/run-eval.md to reflect the change

Motivation

The choice dropdown is limiting when debugging — e.g., wanting to run exactly 10 instances requires picking from a fixed list. A free-form string input lets you type any number.

Test plan

  • Trigger run-eval.yml via workflow dispatch with a non-standard value (e.g., 10) and verify it works

🤖 Generated with Claude Code


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:7385f4d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-7385f4d-python \
  ghcr.io/openhands/agent-server:7385f4d-python

All tags pushed for this build

ghcr.io/openhands/agent-server:7385f4d-golang-amd64
ghcr.io/openhands/agent-server:7385f4d-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:7385f4d-golang-arm64
ghcr.io/openhands/agent-server:7385f4d-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:7385f4d-java-amd64
ghcr.io/openhands/agent-server:7385f4d-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:7385f4d-java-arm64
ghcr.io/openhands/agent-server:7385f4d-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:7385f4d-python-amd64
ghcr.io/openhands/agent-server:7385f4d-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:7385f4d-python-arm64
ghcr.io/openhands/agent-server:7385f4d-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:7385f4d-golang
ghcr.io/openhands/agent-server:7385f4d-java
ghcr.io/openhands/agent-server:7385f4d-python

About Multi-Architecture Support

  • Each variant tag (e.g., 7385f4d-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 7385f4d-python-amd64) are also available if needed

The choice dropdown only allowed specific values (1, 50, 100, 200, 500),
making it annoying to debug with custom instance counts (e.g., 10).
Change to a string input so any positive integer can be entered.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)

============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=190,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solves a real problem (debugging with arbitrary limits) by removing unnecessary constraints. However, needs input validation to prevent confusing failures.

Validate that eval_limit is a positive integer to fail fast with a
clear error instead of causing confusing failures downstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Simple, pragmatic solution.

Removes unnecessary dropdown constraints while adding proper validation. The regex ^[1-9][0-9]*$ correctly rejects zero, negatives, decimals, and non-numeric strings. Solves a real debugging problem (running arbitrary instance counts) without over-engineering. LGTM! ✅

@neubig neubig merged commit e96bce3 into main Mar 2, 2026
31 of 32 checks passed
@neubig neubig deleted the eval-limit-string-input branch March 2, 2026 19:46
zparnold added a commit to zparnold/software-agent-sdk that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants