Skip to content

fix(sdk): surface iteration budgets gracefully#2739

Closed
enyst wants to merge 1 commit intomainfrom
openhands/issue-2406-iteration-limit
Closed

fix(sdk): surface iteration budgets gracefully#2739
enyst wants to merge 1 commit intomainfrom
openhands/issue-2406-iteration-limit

Conversation

@enyst
Copy link
Copy Markdown
Collaborator

@enyst enyst commented Apr 7, 2026

Summary

Fixes #2406 by making the per-run iteration budget visible to the LLM, warning the model when it is about to run out of steps, and ending max-iteration runs with a dedicated iteration-limit status/event instead of a generic error.

This PR:

  • adds an <iteration_budget> block to the dynamic system prompt so the LLM sees the run budget up front
  • injects late-stage wrap-up warnings when only 2 steps remain and when the final step begins
  • emits ConversationExecutionStatus.ITERATION_LIMIT plus a dedicated ConversationIterationLimitEvent when the budget is exhausted
  • adds focused SDK tests for the new prompt context, warning injection, and status parsing/serialization
  • confirms the Commit0 End Status discussed on the issue comes from the benchmarks repo's benchmarks/utils/console_logging.py, which derives it from conversation.state.execution_status plus FinishAction usage rather than agent-server REST/WebSocket metadata

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works? (N/A: no example changes)
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works? (Validated with targeted SDK pytest coverage.)
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name? (N/A: no public API or docs changes.)
  • Is the github CI passing?

This PR was created by an AI assistant (OpenHands) on behalf of the user.

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a3fb7ff-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a3fb7ff-python \
  ghcr.io/openhands/agent-server:a3fb7ff-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a3fb7ff-golang-amd64
ghcr.io/openhands/agent-server:a3fb7ff-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a3fb7ff-golang-arm64
ghcr.io/openhands/agent-server:a3fb7ff-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a3fb7ff-java-amd64
ghcr.io/openhands/agent-server:a3fb7ff-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a3fb7ff-java-arm64
ghcr.io/openhands/agent-server:a3fb7ff-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a3fb7ff-python-amd64
ghcr.io/openhands/agent-server:a3fb7ff-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:a3fb7ff-python-arm64
ghcr.io/openhands/agent-server:a3fb7ff-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:a3fb7ff-golang
ghcr.io/openhands/agent-server:a3fb7ff-java
ghcr.io/openhands/agent-server:a3fb7ff-python

About Multi-Architecture Support

  • Each variant tag (e.g., a3fb7ff-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a3fb7ff-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

REST API breakage checks (OpenAPI) — ❌ FAILED

Result:FAILED

⚠️ Breaking REST API changes or policy violations detected.

Log excerpt (first 1000 characters)

::notice title=openhands-agent-server REST API::Additive oneOf/anyOf expansion detected in response schemas. This is expected for extensible discriminated-union APIs and does not break backward compatibility.
  - added '#/components/schemas/PatternSecurityAnalyzer-Output, #/components/schemas/PolicyRailSecurityAnalyzer-Output, #/components/schemas/EnsembleSecurityAnalyzer-Output' to the '/items/anyOf[subschema #1: ACPConversationInfo]/security_analyzer/anyOf[#/components/schemas/SecurityAnalyzerBase-Output]/' response property 'oneOf' list for the response status '200'
  - added '#/components/schemas/PatternSecurityAnalyzer-Output, #/components/schemas/PolicyRailSecurityAnalyzer-Output, #/components/schemas/EnsembleSecurityAnalyzer-Output' to the 'security_analyzer/anyOf[#/components/schemas/SecurityAnalyzerBase-Output]/' response property 'oneOf' list for the response status '200'
  - added '#/components/schemas/PatternSecurityAnalyzer-Output, #/components/schemas/PolicyRailSecurityA

Action log

@enyst enyst added the integration-test Runs the integration tests and comments the results label Apr 7, 2026 — with OpenHands AI
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🧪 Integration Tests Results

Overall Success Rate: 96.7%
Total Cost: $0.88
Models Tested: 4
Timestamp: 2026-04-07 13:02:02 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_gemini_3.1_pro_preview 100.0% 8/8 0 8 $0.31 245,496
litellm_proxy_anthropic_claude_sonnet_4_6 87.5% 7/8 0 8 $0.44 252,809
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 7/7 1 8 $0.09 328,306
litellm_proxy_deepseek_deepseek_reasoner 100.0% 7/7 1 8 $0.05 694,008

📋 Detailed Results

litellm_proxy_gemini_3.1_pro_preview

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.31
  • Token Usage: prompt: 241,489, completion: 4,007, cache_read: 122,291, reasoning: 2,324
  • Run Suffix: litellm_proxy_gemini_3.1_pro_preview_792626f_gemini_3_1_pro_run_N8_20260407_125831

litellm_proxy_anthropic_claude_sonnet_4_6

  • Success Rate: 87.5% (7/8)
  • Total Cost: $0.44
  • Token Usage: prompt: 247,371, completion: 5,438, cache_read: 165,559, cache_write: 81,572, reasoning: 1,042
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_792626f_claude_sonnet_4_6_run_N8_20260407_125832

Failed Tests:

  • t02_add_bash_hello: Shell script is not executable (Cost: $0.05)

litellm_proxy_moonshot_kimi_k2_thinking

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.09
  • Token Usage: prompt: 322,670, completion: 5,636, cache_read: 263,424
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_792626f_kimi_k2_thinking_run_N8_20260407_125829
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_deepseek_deepseek_reasoner

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.05
  • Token Usage: prompt: 678,329, completion: 15,679, cache_read: 595,968, reasoning: 6,954
  • Run Suffix: litellm_proxy_deepseek_deepseek_reasoner_792626f_deepseek_v3_2_reasoner_run_N8_20260407_125832
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

@enyst enyst closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proposal: make Max Iteration Limit visible to the LLM and have a Graceful Termination Message

1 participant