Skip to content

Release v1.17.0#2812

Merged
xingyaoww merged 3 commits intomainfrom
rel-1.17.0
Apr 13, 2026
Merged

Release v1.17.0#2812
xingyaoww merged 3 commits intomainfrom
rel-1.17.0

Conversation

@all-hands-bot
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot commented Apr 13, 2026

Release v1.17.0

This PR prepares the release for version 1.17.0.

Release Checklist

  • Version set to 1.17.0
  • Fix any deprecation deadlines if they exist
  • Integration tests pass (tagged with integration-test)
  • Behavior tests pass (tagged with behavior-test)
  • Example tests pass (tagged with test-examples)
  • Draft release created at https://github.com/OpenHands/software-agent-sdk/releases/new
    • Select tag: v1.17.0
    • Select branch: rel-1.17.0
    • Auto-generate release notes
    • Publish release (PyPI will auto-publish)
  • Evaluation on OpenHands Index

Next Steps

  1. Review the version changes
  2. Address any deprecation deadlines
  3. Ensure integration tests pass
  4. Ensure behavior tests pass
  5. Ensure example tests pass
  6. Create and publish the release

Once the release is published on GitHub, the PyPI packages will be automatically published via the pypi-release.yml workflow.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:aabf407-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-aabf407-python \
  ghcr.io/openhands/agent-server:aabf407-python

All tags pushed for this build

ghcr.io/openhands/agent-server:aabf407-golang-amd64
ghcr.io/openhands/agent-server:aabf407-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:aabf407-golang-arm64
ghcr.io/openhands/agent-server:aabf407-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:aabf407-java-amd64
ghcr.io/openhands/agent-server:aabf407-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:aabf407-java-arm64
ghcr.io/openhands/agent-server:aabf407-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:aabf407-python-amd64
ghcr.io/openhands/agent-server:aabf407-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:aabf407-python-arm64
ghcr.io/openhands/agent-server:aabf407-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:aabf407-golang
ghcr.io/openhands/agent-server:aabf407-java
ghcr.io/openhands/agent-server:aabf407-python

About Multi-Architecture Support

  • Each variant tag (e.g., aabf407-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., aabf407-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
@all-hands-bot all-hands-bot added integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. behavior-test labels Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Copy Markdown
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator Author

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean release prep.

This is a mechanical version bump from 1.16.1 → 1.17.0 across all packages. The changes are exactly what you'd expect for a release:

  • Version updates in all pyproject.toml files
  • Matching version updates in uv.lock
  • Eval workflow default updated to match
  • exclude-newer timestamp refresh (normal uv lock behavior)

No complexity, no behavior changes, no breaking changes. Standard release process.

VERDICT:Worth merging - This is the correct way to prepare a release. Once the checklist items are completed, this is ready to go.


This review was created by an AI assistant (OpenHands) on behalf of the user.

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Apr 13, 2026

@OpenHands fix the CI failing on deprecation deadlines. Understand why it fails and commit directly here the clean up it needs.

The upstream browser_use 0.11.9 still has module-level
basicConfig(force=True) and _ensure_all_loggers_use_stderr() that
interfere with our logging. The upstream PR #3717 was closed without
merge. Extend the cleanup deadline from 1.17.0 to 1.19.0.

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 13, 2026

I'm on it! enyst can track my progress at all-hands.dev

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 96.7%
Total Cost: $0.87
Models Tested: 4
Timestamp: 2026-04-13 15:29:02 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 7/7 1 8 $0.07 222,136
litellm_proxy_deepseek_deepseek_reasoner 100.0% 7/7 1 8 $0.04 769,231
litellm_proxy_gemini_3.1_pro_preview 100.0% 8/8 0 8 $0.32 207,326
litellm_proxy_anthropic_claude_sonnet_4_6 87.5% 7/8 0 8 $0.44 251,244

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2_thinking

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.07
  • Token Usage: prompt: 217,108, completion: 5,028, cache_read: 169,728
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_8f0c893_kimi_k2_thinking_run_N8_20260413_152436
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_deepseek_deepseek_reasoner

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Token Usage: prompt: 754,344, completion: 14,887, cache_read: 695,936, reasoning: 6,346
  • Run Suffix: litellm_proxy_deepseek_deepseek_reasoner_8f0c893_deepseek_v3_2_reasoner_run_N8_20260413_152437
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_gemini_3.1_pro_preview

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.32
  • Token Usage: prompt: 202,263, completion: 5,063, cache_read: 81,736, reasoning: 3,581
  • Run Suffix: litellm_proxy_gemini_3.1_pro_preview_8f0c893_gemini_3_1_pro_run_N8_20260413_152436

litellm_proxy_anthropic_claude_sonnet_4_6

  • Success Rate: 87.5% (7/8)
  • Total Cost: $0.44
  • Token Usage: prompt: 246,230, completion: 5,014, cache_read: 161,966, cache_write: 84,032, reasoning: 785
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_8f0c893_claude_sonnet_4_6_run_N8_20260413_152436

Failed Tests:

  • t02_add_bash_hello: Shell script is not executable (Cost: $0.06)

Copy link
Copy Markdown
Collaborator Author

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Release version bump verified successfully. All packages consistently versioned at 1.17.0, build succeeds, and deprecation deadlines resolved.

Phase Result
Environment Setup ✅ Build successful, lockfile up to date
CI & Tests ✅ Pre-commit, API breakage, version checks pass; build/test jobs in progress
Functional Verification ✅ Version consistency, imports, deprecation check all pass
Functional Verification

Version Consistency ✅

All four packages correctly versioned at 1.17.0:

$ grep -r "version = " openhands-*/pyproject.toml
openhands-agent-server/pyproject.toml:version = "1.17.0"
openhands-sdk/pyproject.toml:version = "1.17.0"
openhands-tools/pyproject.toml:version = "1.17.0"
openhands-workspace/pyproject.toml:version = "1.17.0"

Lockfile Consistency ✅

$ uv lock --locked --check
Resolved 401 packages in 1ms

Package Imports ✅

$ uv run python -c "import openhands.sdk; print(f'SDK version: {openhands.sdk.__version__}')"
✓ All packages import successfully
SDK version: 1.17.0

Deprecation Deadlines ✅

$ uv run --with packaging python .github/scripts/check_deprecations.py
openhands-sdk: checked 7 deprecation metadata entries against version 1.17.0.
openhands-tools: checked 2 deprecation metadata entries against version 1.17.0.
openhands-workspace: checked 1 deprecation metadata entries against version 1.17.0.
openhands-agent-server: checked 6 deprecation metadata entries against version 1.17.0.
Checked 16 deprecation metadata entries across 4 package(s).

The initial failure of the deprecation deadline check was resolved by commit 9e90afa, which extended the browser_use logging workaround deadline from 1.17.0 to 1.19.0 with updated context noting that browser_use 0.11.9 still has the problematic logging code.

Build Success ✅

$ make build
Dependencies installed successfully.
Pre-commit hooks installed successfully.
Build complete! Development environment is ready.

Issues Found

None. The deprecation deadline issue was already addressed in commit 9e90afa.

CI Status

Passing (relevant to release):

  • ✅ Pre-commit checks
  • ✅ Python API breakage check
  • ✅ REST API breakage check
  • ✅ Version bump guard
  • ✅ Deprecation deadlines
  • ✅ Check docstrings
  • ✅ PR artifacts check

In Progress (expected to pass):

  • ⏳ SDK/tools/agent-server unit tests
  • ⏳ Build & Push jobs (multi-platform)
  • ⏳ Binary build tests

The critical release-blocking checks (deprecation deadlines, version consistency, API breakage) all pass. The in-progress jobs are standard test/build jobs that typically pass and don't require manual verification.

Copy link
Copy Markdown
Collaborator

enyst commented Apr 13, 2026

Fixed in 9e90afa.

The failure came from warn_cleanup(..., cleanup_by="1.17.0") in openhands-tools/openhands/tools/browser_use/logging_fix.py, which becomes an expired cleanup deadline on the 1.17.0 release branch.

I verified the upstream browser_use package still applies module-level logging changes (logging.basicConfig(..., force=True) and _ensure_all_loggers_use_stderr() in the installed 0.11.9 codepath), so the workaround is still needed. I extended the cleanup deadline to 1.19.0 and updated the note to explain why this cleanup cannot happen in 1.17.0 yet.

The Deprecation deadlines check is now passing.

This comment was created by an AI assistant (OpenHands) on behalf of the user.

@openhands-ai

This comment was marked as duplicate.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL23501602574% 
report-only-changed-files is enabled. No files were changed during this commit :)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-04-13 15:48:46 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 22.9s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 18.2s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 12.4s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 29.2s $0.03
01_standalone_sdk/09_pause_example.py ✅ PASS 11.7s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 30.0s $0.02
01_standalone_sdk/11_async.py ✅ PASS 29.3s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 14.1s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 37.3s $0.03
01_standalone_sdk/14_context_condenser.py ✅ PASS 3m 13s $0.23
01_standalone_sdk/17_image_input.py ✅ PASS 14.0s $0.01
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 16.9s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 13.1s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 15.1s $0.03
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 8.8s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 17.2s $0.02
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 3s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 2m 52s $0.20
01_standalone_sdk/25_agent_delegation.py ✅ PASS 49.1s $0.06
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 21.6s $0.03
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 28.3s $0.04
01_standalone_sdk/29_llm_streaming.py ✅ PASS 29.1s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 15.4s $0.02
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 3m 8s $0.24
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 19.5s $0.02
01_standalone_sdk/34_critic_example.py ✅ PASS 1m 58s $0.16
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 12.4s $0.01
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 3.3s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 38.5s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 9.6s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 34.2s $0.05
01_standalone_sdk/41_task_tool_set.py ✅ PASS 26.5s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 1m 5s $0.06
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 2.8s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 6.5s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 2m 19s $0.32
01_standalone_sdk/46_agent_settings.py ✅ PASS 10.4s $0.01
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 2.8s $0.00
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 31.7s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
4.1s --
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
4.0s --
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ❌ FAIL
Exit code 1
5m 8s --
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 29.5s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ❌ FAIL
Exit code 1
4.1s --
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ❌ FAIL
Exit code 1
5m 6s --
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 26.8s $0.05
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 16.6s $0.02
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 44.5s $0.06
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 11.6s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 19.9s $0.04

❌ Some tests failed

Total: 50 | Passed: 45 | Failed: 5 | Total Cost: $2.06

Failed examples:

  • examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/09_acp_agent_with_remote_runtime.py: Exit code 1

View full workflow run

@neubig neubig removed the test-examples Run all applicable "examples/" files. Expensive operation. label Apr 13, 2026
@neubig neubig added the test-examples Run all applicable "examples/" files. Expensive operation. label Apr 13, 2026 — with OpenHands AI
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-04-13 16:14:04 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 25.1s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 21.4s $0.02
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 11.5s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 36.4s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 11.3s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 42.7s $0.03
01_standalone_sdk/11_async.py ✅ PASS 37.4s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 9.5s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 32.0s $0.03
01_standalone_sdk/14_context_condenser.py ✅ PASS 3m 0s $0.20
01_standalone_sdk/17_image_input.py ✅ PASS 14.7s $0.01
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 28.9s $0.03
01_standalone_sdk/19_llm_routing.py ✅ PASS 15.3s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 14.7s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 10.3s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 14.8s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 17s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 3m 50s $0.25
01_standalone_sdk/25_agent_delegation.py ✅ PASS 52.3s $0.07
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 18.0s $0.02
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 36.4s $0.04
01_standalone_sdk/29_llm_streaming.py ✅ PASS 42.0s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 9.1s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 5m 26s $0.41
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 17.7s $0.02
01_standalone_sdk/34_critic_example.py ✅ PASS 6m 35s $0.63
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 10.6s $0.00
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 3.6s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 32.7s $0.04
01_standalone_sdk/39_llm_fallback.py ✅ PASS 10.4s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 42.7s $0.05
01_standalone_sdk/41_task_tool_set.py ✅ PASS 32.3s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 1m 43s $0.10
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 3.3s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 7.1s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 2m 36s $0.45
01_standalone_sdk/46_agent_settings.py ✅ PASS 12.0s $0.01
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 3.2s $0.00
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 33.7s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
4.9s --
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
4.9s --
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ❌ FAIL
Exit code 1
5m 9s --
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 34.4s $0.05
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ❌ FAIL
Exit code 1
4.8s --
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ❌ FAIL
Exit code 1
5m 7s --
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 27.8s $0.05
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 17.2s $0.01
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 29.5s $0.06
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 11.9s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 21.5s $0.04

❌ Some tests failed

Total: 50 | Passed: 45 | Failed: 5 | Total Cost: $2.93

Failed examples:

  • examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/09_acp_agent_with_remote_runtime.py: Exit code 1

View full workflow run

@neubig
Copy link
Copy Markdown
Contributor

neubig commented Apr 13, 2026

@OpenHands dig in to the logs from the failed examples, are these concerning, and did they fail on previous releases?

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 13, 2026

I'm on it! neubig can track my progress at all-hands.dev

Copy link
Copy Markdown
Contributor

neubig commented Apr 13, 2026

Analysis of Failed Example Tests

This analysis was generated by an AI assistant (OpenHands).

Summary

Not concerning. All 5 failures are caused by a single infrastructure issue (Docker image tag mismatch), not by any code regression. These same examples all passed on both v1.16.0 and v1.15.0.

The 5 Failures

Example Failure Mode Duration
02_convo_with_docker_sandboxed_server.py manifest unknown – image not found 4.9s
03_browser_use_with_docker_sandboxed_server.py manifest unknown – image not found 4.9s
08_convo_with_apptainer_sandboxed_server.py manifest unknown – image not found 4.8s
04_convo_with_api_sandboxed_server.py Pod stuck pending (timeout) 5m 9s
09_acp_agent_with_remote_runtime.py Pod stuck pending (timeout) 5m 7s

Root Cause: Image Tag SHA Mismatch

All 5 examples try to pull ghcr.io/openhands/agent-server:a76b56b-python-amd64, but that tag does not exist in the registry.

  • The Agent Server build for this PR push used SDK_SHA → tagged images as 9e90afa-python-amd64 ✅ (these images exist and were pushed successfully)
  • The examples resolved GITHUB_SHA to a76b56b (appears to be the PR merge commit SHA from the Actions runner default, despite the workflow step attempting to override it with the head SHA 9e90afa)

The docker/apptainer examples (02, 03, 08) fail immediately with "manifest unknown". The remote runtime examples (04, 09) fail after ~5 minutes when the pod never starts because the image can't be pulled.

Comparison with Previous Releases

Release These 5 Examples Other Failures
v1.17.0 ❌ All 5 failed (image tag mismatch) None
v1.16.0 ✅ All 5 passed 10_cloud_workspace_share_credentials.py, 02_gemini_file_tools.py
v1.15.0 ✅ All 5 passed 10_cloud_workspace_share_credentials.py

Conclusion

  • 45/50 examples passed, including all standalone SDK tests, local agent server, cloud workspace, LLM-specific tools, and skills/plugins.
  • The failures are purely infrastructure-related (image registry tag mismatch), not a code regression.
  • A re-run of the workflow should resolve this if the merge commit SHA stabilizes, or the GITHUB_SHA override in run-examples.yml could be strengthened to avoid the race.

1 similar comment
Copy link
Copy Markdown
Contributor

neubig commented Apr 13, 2026

Analysis of Failed Example Tests

This analysis was generated by an AI assistant (OpenHands).

Summary

Not concerning. All 5 failures are caused by a single infrastructure issue (Docker image tag mismatch), not by any code regression. These same examples all passed on both v1.16.0 and v1.15.0.

The 5 Failures

Example Failure Mode Duration
02_convo_with_docker_sandboxed_server.py manifest unknown – image not found 4.9s
03_browser_use_with_docker_sandboxed_server.py manifest unknown – image not found 4.9s
08_convo_with_apptainer_sandboxed_server.py manifest unknown – image not found 4.8s
04_convo_with_api_sandboxed_server.py Pod stuck pending (timeout) 5m 9s
09_acp_agent_with_remote_runtime.py Pod stuck pending (timeout) 5m 7s

Root Cause: Image Tag SHA Mismatch

All 5 examples try to pull ghcr.io/openhands/agent-server:a76b56b-python-amd64, but that tag does not exist in the registry.

  • The Agent Server build for this PR push used SDK_SHA → tagged images as 9e90afa-python-amd64 ✅ (these images exist and were pushed successfully)
  • The examples resolved GITHUB_SHA to a76b56b (appears to be the PR merge commit SHA from the Actions runner default, despite the workflow step attempting to override it with the head SHA 9e90afa)

The docker/apptainer examples (02, 03, 08) fail immediately with "manifest unknown". The remote runtime examples (04, 09) fail after ~5 minutes when the pod never starts because the image can't be pulled.

Comparison with Previous Releases

Release These 5 Examples Other Failures
v1.17.0 ❌ All 5 failed (image tag mismatch) None
v1.16.0 ✅ All 5 passed 10_cloud_workspace_share_credentials.py, 02_gemini_file_tools.py
v1.15.0 ✅ All 5 passed 10_cloud_workspace_share_credentials.py

Conclusion

  • 45/50 examples passed, including all standalone SDK tests, local agent server, cloud workspace, LLM-specific tools, and skills/plugins.
  • The failures are purely infrastructure-related (image registry tag mismatch), not a code regression.
  • A re-run of the workflow should resolve this if the merge commit SHA stabilizes, or the GITHUB_SHA override in run-examples.yml could be strengthened to avoid the race.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 13, 2026

Summary

The PR comment asked me to investigate the failed example tests on PR #2812 (Release v1.17.0), determine if they're concerning, and check if they failed on previous releases.

Answer

The 5 failures are not concerning. They are all caused by a single CI infrastructure issue — a Docker image tag SHA mismatch — not a code regression.

Root cause: All 5 failing examples (02_convo_with_docker_sandboxed_server, 03_browser_use_with_docker_sandboxed_server, 04_convo_with_api_sandboxed_server, 08_convo_with_apptainer_sandboxed_server, 09_acp_agent_with_remote_runtime) tried to pull ghcr.io/openhands/agent-server:a76b56b-python-amd64, but that tag doesn't exist. The Agent Server build correctly tagged images as 9e90afa-python-amd64. The mismatch appears to stem from the examples resolving GITHUB_SHA to the PR merge commit SHA (a76b56b) rather than the head commit SHA (9e90afa) that the server build used via SDK_SHA.

Previous releases: All 5 of these examples passed on both v1.16.0 and v1.15.0. The failures on those releases were entirely different examples (10_cloud_workspace_share_credentials.py and 02_gemini_file_tools.py).

Overall: 45/50 examples passed, covering all standalone SDK tests, local agent server, cloud workspace, LLM-specific tools, and skills/plugins.

Actions Taken

  • Downloaded and analyzed the full CI logs for the failed Run Examples Scripts workflow run
  • Identified the specific error for each of the 5 failures
  • Traced the root cause to an image tag SHA mismatch
  • Compared against v1.16.0 and v1.15.0 example test runs to confirm these are not recurring failures
  • Posted a detailed analysis comment on PR Release v1.17.0 #2812 (link)

No code changes were made — this was purely an investigative/analytical task.

Copy link
Copy Markdown
Contributor

neubig commented Apr 13, 2026

Precise Diagnosis of Example Test Failures

This comment was generated by an AI assistant (OpenHands) on behalf of the user.

Verdict: Code Bug (not a race condition)

This is a known code bug where run-examples.yml tries to override GitHub's built-in GITHUB_SHA env var, but the override doesn't take effect. The agent-server images do exist and were built before the examples ran — the examples just look for the wrong tag.

Exact Code Path

run-examples.yml:79
  env:
    GITHUB_SHA: ${{ github.event.pull_request.head.sha }}   ← sets "9e90afa" but override doesn't stick
                                                               for built-in GITHUB_* vars

  ↓ Runner's default GITHUB_SHA = merge commit "a76b56b" persists

test_examples.py:112
  env = os.environ.copy()                                    ← copies env with default GITHUB_SHA

test_examples.py:121-124
  subprocess.run([sys.executable, example_path], env=env)    ← passes to example

example (e.g. 02_convo_with_docker_sandboxed_server.py:45)
  github_sha = os.getenv("GITHUB_SHA")                      ← gets "a76b56b" (merge commit)

example line 47
  f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}"
  → "ghcr.io/openhands/agent-server:a76b56b-python-amd64"   ← this image does NOT exist

Docker/Apptainer → "manifest unknown" (immediate failure)
Remote runtime   → pod stuck "pending" for 5 min (image pull fails)

Evidence

Fact Value
PR merge commit SHA a76b56b (what examples used)
PR head commit SHA 9e90afa (what the override attempted)
Agent Server images built 9e90afa-python-amd64 ✅ (exist, pushed successfully)
Image the examples looked for a76b56b-python-amd64 ❌ (doesn't exist)
Wait-for-build step Completed successfully — images were ready

Why It Worked on v1.16.0 But Not v1.17.0

Both use the identical workflow code. The difference is the GitHub Actions runner image version:

  • v1.16.0: ubuntu24/20260323.65 — override happened to work
  • v1.17.0: ubuntu24/20260406.80 — override no longer effective

GitHub's built-in GITHUB_* env vars are documented as not reliably overridable. The server.yml workflow already knows this and uses a custom SDK_SHA variable instead.

The Fix

There's actually an existing (unmerged) fix in commit d01c055c on branch upgrade-python313-with-libtmux-fix that does exactly the right thing:

run-examples.yml: rename GITHUB_SHAAGENT_SERVER_SHA

- GITHUB_SHA: ${{ github.event.pull_request.head.sha }}
+ AGENT_SERVER_SHA: ${{ github.event.pull_request.head.sha }}

All 5 affected examples: read AGENT_SERVER_SHA instead of GITHUB_SHA

- github_sha = os.getenv("GITHUB_SHA")
+ server_sha = os.getenv("AGENT_SERVER_SHA")

Files that need the change:

  • .github/workflows/run-examples.yml
  • examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py
  • examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py
  • examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py
  • examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py
  • examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py
  • examples/02_remote_agent_server/09_acp_agent_with_remote_runtime.py

This should go in as a separate PR to main, not in the release branch.

Copy link
Copy Markdown
Contributor

neubig commented Apr 13, 2026

Ruling Out a Race Condition

This comment was generated by an AI assistant (OpenHands) on behalf of the user.

Not a race condition. The images were built and pushed 24 minutes before the examples ran. The examples just computed the wrong image tag.

Timeline (all UTC, April 13)

Time Event
15:26:40 Agent Server Build & Push (python-amd64) starts for SHA 9e90afa
15:35:16 Build & Push (python-amd64) completes → image 9e90afa-python-amd64 pushed to GHCR ✅
15:57:46 Run-examples Wait for agent server step starts
15:57:59 Wait step finds Build & Push (python-amd64): completed (success) — finishes instantly
15:59:35 Run examples step begins (uv run pytest ...)
~16:00 Examples resolve os.getenv("GITHUB_SHA")a76b56b (merge commit, not 9e90afa)
~16:00 Docker pull ghcr.io/openhands/agent-server:a76b56b-python-amd64manifest unknown

The wait-on-check-action log explicitly confirms the build was already done:

Checks completed: Build & Push (python-amd64)
Build & Push (python-amd64): completed (success)

The Bug

This is a code error in 7 files. The workflow passes the PR head SHA via a variable named GITHUB_SHA, but GITHUB_SHA is a GitHub Actions built-in that defaults to the merge commit SHA for pull_request events. The step-level override doesn't reliably take effect (runner image ubuntu24/20260406.80 stopped honoring it; v1.16.0 ran on ubuntu24/20260323.65 where it still worked).

The server.yml workflow already solved this same problem with a comment explaining why:

# Use SDK_SHA for PR head SHA - GITHUB_SHA is a built-in that gets overwritten by checkout
SDK_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

The fix is to use a custom variable name (e.g. AGENT_SERVER_SHA) in run-examples.yml and update the 6 example scripts that read it. An unmerged commit (d01c055c on branch upgrade-python313-with-libtmux-fix) already implements exactly this fix.

@neubig
Copy link
Copy Markdown
Contributor

neubig commented Apr 13, 2026

I think this release is probably OK to deploy, there are failing examples but this seems to be due to a mismatched SHA, as opposed to a real issue.

@neubig neubig requested a review from xingyaoww April 13, 2026 17:59
neubig pushed a commit that referenced this pull request Apr 13, 2026
… GITHUB_SHA

GitHub Actions sets GITHUB_SHA to the merge commit for pull_request events.
The step-level env override stopped working reliably with runner image
ubuntu24/20260406.80. Use a custom variable name AGENT_SERVER_SHA to
explicitly pass the PR head SHA to example scripts for Docker image
tag resolution.

This fixes the 5 docker/apptainer/remote-runtime example failures
seen in the v1.17.0 release PR (#2812).

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 80.0%
Total Cost: $4.00
Models Tested: 2
Timestamp: 2026-04-13 18:24:56 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_moonshot_kimi_k2_thinking 60.0% 3/5 0 5 $1.01 4,352,538
litellm_proxy_anthropic_claude_sonnet_4_6 100.0% 5/5 0 5 $2.99 3,811,252

📋 Detailed Results

litellm_proxy_moonshot_kimi_k2_thinking

  • Success Rate: 60.0% (3/5)
  • Total Cost: $1.01
  • Token Usage: prompt: 4,304,499, completion: 48,039, cache_read: 3,969,280
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_8f0c893_kimi_k2_thinking_run_N5_20260413_152437

Failed Tests:

  • b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent successfully created the main requested file examples/tutorial/smolvla/train_smolvla_example.py, which is a well-structured standalone training script that:
  • Correctly mirrors the user's original lerobot-train command
  • Follows the same format as existing examples (using_smolvla_example.py and diffusion_training_example.py)
  • Includes proper configuration, training loop, checkpointing, and model saving
  • Has clear documentation within the script

However, the agent violated the evaluation criteria by creating additional files beyond what was explicitly requested:

Files Created (3 total):

  1. train_smolvla_example.py - REQUESTED ✓
  2. README.md - ACCEPTABLE (criteria allow "only one README.md file")
  3. QUICKSTART.md - NOT REQUESTED and NOT ACCEPTABLE per criteria

The evaluation criteria explicitly state: "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script."

By creating QUICKSTART.md, the agent exceeded this boundary. While the extra documentation is helpful and well-intentioned, it represents a failure to follow the stated constraints. The user asked for a training script following the format of existing examples - they did not request additional documentation files.

Positive Aspects:

  • Main training script is high quality and functional
  • README.md is appropriate
  • Good understanding of the task requirements
  • Comprehensive testing and verification

Negative Aspects:

  • Created one extra file (QUICKSTART.md) not explicitly requested
  • Violated the explicit constraint about avoiding additional files (confidence=0.92) (Cost: $0.30)
  • b02_no_oververification: Agent did not satisfy the truncation task criteria. Judge reasoning: The agent successfully completed the core task: updating MAX_CMD_OUTPUT_SIZE from 30000 to 20000 in the constants file and verifying the change with appropriate tests. All 5 targeted tests in test_observation_truncation.py passed.

However, there is a material deviation from the user's explicit instructions:

Critical Issue - Ignored Environment Instructions:
The user provided explicit environment notes stating: "Use uv (as per development guide) to avoid collision with the other checkout when running Python commands." This was marked as an "Important environment note."

The agent ignored this instruction and ran all Python commands directly:

  • python -m pytest tests/tools/terminal/test_observation_truncation.py -v
  • python -c "from openhands.tools.terminal.constants import MAX_CMD_OUTPUT_SIZE; ..."
  • Multiple other direct python commands

These should have been executed with uv run or similar uv commands per the user's development guide.

Positive Aspects:

  1. Correctly identified and updated the constant ✓
  2. Ran appropriate targeted tests (tests/tools/terminal/test_observation_truncation.py) ✓
  3. All 5 tests passed ✓
  4. Did not over-verify excessively (attempted broader tests but interrupted when slow)
  5. Verified the change worked correctly
  6. Provided comprehensive summary of changes

Assessment:
While the functional work was done correctly and the test verification was appropriate, the explicit disregard for the environment setup instructions is problematic. The user's note about using uv to avoid collision with another checkout was clear and deliberate. Although the collision risk may not have materialized in this case, following explicit instructions is important for reliability and best practices. (confidence=0.85) (Cost: $0.10)

litellm_proxy_anthropic_claude_sonnet_4_6

  • Success Rate: 100.0% (5/5)
  • Total Cost: $2.99
  • Token Usage: prompt: 3,745,435, completion: 65,817, cache_read: 3,400,153, cache_write: 249,570, reasoning: 11,889
  • Run Suffix: litellm_proxy_anthropic_claude_sonnet_4_6_8f0c893_claude_sonnet_4_6_run_N5_20260413_152431

@xingyaoww
Copy link
Copy Markdown
Collaborator

@OpenHands help me understand why docker examples are failing? Any way to fix it? Maybe retrigger it for me? #2812 (comment)

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 13, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

The built-in GITHUB_SHA resolves to the merge-commit SHA on
pull_request events, which does not match the agent-server image tags
built from the PR head commit.  The agent-server build workflow already
uses SDK_SHA for this reason.

Switch run-examples.yml to export SDK_SHA (matching the server workflow)
and update all remote-agent-server examples to prefer SDK_SHA, falling
back to GITHUB_SHA for backward compatibility.

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww removed the test-examples Run all applicable "examples/" files. Expensive operation. label Apr 13, 2026
@xingyaoww xingyaoww added the test-examples Run all applicable "examples/" files. Expensive operation. label Apr 13, 2026 — with OpenHands AI
Copy link
Copy Markdown
Collaborator

Docker examples fix —

Root cause

The 5 failing docker/remote-server examples all tried to pull ghcr.io/openhands/agent-server:a76b56b-python-amd64, but that image never existed. a76b56b is the merge-commit SHA that GitHub automatically creates for pull_request events (github.sha), not the actual PR head commit.

The agent-server build workflow tags images with the PR head SHA (9e90afa) via a dedicated SDK_SHA env var — exactly because GITHUB_SHA is a built-in that resolves to the merge-commit on PRs. However, the run-examples.yml workflow was trying to override the built-in GITHUB_SHA in the step env, and the examples were reading GITHUB_SHA — but the override wasn't reaching the subprocess (the built-in merge-commit SHA leaked through).

Fix

  1. .github/workflows/run-examples.yml — replaced the GITHUB_SHA override with SDK_SHA (matching the server build workflow)
  2. All 6 remote-agent-server examples — now read SDK_SHA first, falling back to GITHUB_SHA for backward compatibility

I've also re-added the test-examples label to retrigger the workflow.

This comment was generated by an AI assistant (OpenHands).

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 13, 2026

Final Summary

Was the request completely addressed?

  • Understand why docker examples are failing — Root cause identified: the 5 failing examples used the built-in GITHUB_SHA (which resolves to the merge-commit SHA a76b56b on PR events) to construct docker image tags, but agent-server images are tagged with the PR head commit SHA (9e90afa) via SDK_SHA. The override in the step env: block wasn't propagating to subprocesses.
  • Fix the issue — Committed aabf4072: updated run-examples.yml to export SDK_SHA (matching the server build workflow) and updated all 6 remote-agent-server examples to prefer SDK_SHA over GITHUB_SHA.
  • Retrigger the workflow — Removed and re-added the test-examples label to trigger a new run with the fix.
  • Posted explanation comment on PR Release v1.17.0 #2812 with root cause and fix details.

Are the changes concise?

Yes. The changeset is minimal — 7 files, 27 lines changed (all symmetric replacements). No extraneous changes were introduced:

  • .github/workflows/run-examples.yml — one line: GITHUB_SHASDK_SHA
  • 6 example files — each changed the env var lookup from os.getenv("GITHUB_SHA") to os.getenv("SDK_SHA") or os.getenv("GITHUB_SHA") with updated comments

All pre-commit checks passed (YAML format, Ruff, pycodestyle, pyright, import rules).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-04-13 18:58:50 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 23.8s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 19.0s $0.02
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 11.8s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 39.3s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 13.7s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 34.2s $0.03
01_standalone_sdk/11_async.py ✅ PASS 33.3s $0.04
01_standalone_sdk/12_custom_secrets.py ✅ PASS 9.6s $0.00
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 35.9s $0.03
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 54s $0.20
01_standalone_sdk/17_image_input.py ✅ PASS 16.3s $0.01
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 26.2s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 14.1s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 15.7s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 10.8s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 14.7s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 23s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 4m 8s $0.30
01_standalone_sdk/25_agent_delegation.py ✅ PASS 53.0s $0.07
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 18.3s $0.03
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 29.2s $0.03
01_standalone_sdk/29_llm_streaming.py ✅ PASS 30.2s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 22.6s $0.02
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 3m 59s $0.29
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 20.1s $0.02
01_standalone_sdk/34_critic_example.py ✅ PASS 6m 40s $0.57
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 13.0s $0.01
01_standalone_sdk/37_llm_profile_store/main.py ✅ PASS 6.4s $0.00
01_standalone_sdk/38_browser_session_recording.py ✅ PASS 34.7s $0.03
01_standalone_sdk/39_llm_fallback.py ✅ PASS 11.2s $0.01
01_standalone_sdk/40_acp_agent_example.py ✅ PASS 35.0s $0.21
01_standalone_sdk/41_task_tool_set.py ✅ PASS 35.3s $0.03
01_standalone_sdk/42_file_based_subagents.py ✅ PASS 1m 29s $0.09
01_standalone_sdk/43_mixed_marketplace_skills/main.py ✅ PASS 6.5s $0.00
01_standalone_sdk/44_model_switching_in_convo.py ✅ PASS 7.6s $0.01
01_standalone_sdk/45_parallel_tool_execution.py ✅ PASS 3m 17s $0.41
01_standalone_sdk/46_agent_settings.py ✅ PASS 10.1s $0.01
01_standalone_sdk/47_defense_in_depth_security.py ✅ PASS 3.3s $0.00
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 42.2s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 34s $0.07
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 52.7s $0.06
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 24s $0.02
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 30.9s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ✅ PASS 3m 15s $0.03
02_remote_agent_server/09_acp_agent_with_remote_runtime.py ✅ PASS 53.4s $0.24
02_remote_agent_server/10_cloud_workspace_share_credentials.py ✅ PASS 28.4s $0.05
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 27.7s $0.03
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 39.0s $0.09
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 17.8s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 21.3s $0.02

✅ All tests passed!

Total: 50 | Passed: 50 | Failed: 0 | Total Cost: $3.31

View full workflow run

@xingyaoww xingyaoww enabled auto-merge (squash) April 13, 2026 19:04
@github-actions
Copy link
Copy Markdown
Contributor

Evaluation Triggered

  • Trigger: Release v1.17.0
  • SDK: aabf407
  • Eval limit: 50
  • Models: claude-sonnet-4-5-20250929

@xingyaoww xingyaoww merged commit 9626c66 into main Apr 13, 2026
61 checks passed
@xingyaoww xingyaoww deleted the rel-1.17.0 branch April 13, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

behavior-test integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants