Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean release prep.
This is a mechanical version bump from 1.16.1 → 1.17.0 across all packages. The changes are exactly what you'd expect for a release:
- Version updates in all
pyproject.tomlfiles - Matching version updates in
uv.lock - Eval workflow default updated to match
exclude-newertimestamp refresh (normaluv lockbehavior)
No complexity, no behavior changes, no breaking changes. Standard release process.
VERDICT: ✅ Worth merging - This is the correct way to prepare a release. Once the checklist items are completed, this is ready to go.
This review was created by an AI assistant (OpenHands) on behalf of the user.
|
@OpenHands fix the CI failing on deprecation deadlines. Understand why it fails and commit directly here the clean up it needs. |
The upstream browser_use 0.11.9 still has module-level basicConfig(force=True) and _ensure_all_loggers_use_stderr() that interfere with our logging. The upstream PR #3717 was closed without merge. Extend the cleanup deadline from 1.17.0 to 1.19.0. Co-authored-by: openhands <openhands@all-hands.dev>
|
I'm on it! enyst can track my progress at all-hands.dev |
🧪 Integration Tests ResultsOverall Success Rate: 96.7% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_deepseek_deepseek_reasoner
Skipped Tests:
litellm_proxy_gemini_3.1_pro_preview
litellm_proxy_anthropic_claude_sonnet_4_6
Failed Tests:
|
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Release version bump verified successfully. All packages consistently versioned at 1.17.0, build succeeds, and deprecation deadlines resolved.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Build successful, lockfile up to date |
| CI & Tests | ✅ Pre-commit, API breakage, version checks pass; build/test jobs in progress |
| Functional Verification | ✅ Version consistency, imports, deprecation check all pass |
Functional Verification
Version Consistency ✅
All four packages correctly versioned at 1.17.0:
$ grep -r "version = " openhands-*/pyproject.toml
openhands-agent-server/pyproject.toml:version = "1.17.0"
openhands-sdk/pyproject.toml:version = "1.17.0"
openhands-tools/pyproject.toml:version = "1.17.0"
openhands-workspace/pyproject.toml:version = "1.17.0"Lockfile Consistency ✅
$ uv lock --locked --check
Resolved 401 packages in 1msPackage Imports ✅
$ uv run python -c "import openhands.sdk; print(f'SDK version: {openhands.sdk.__version__}')"
✓ All packages import successfully
SDK version: 1.17.0Deprecation Deadlines ✅
$ uv run --with packaging python .github/scripts/check_deprecations.py
openhands-sdk: checked 7 deprecation metadata entries against version 1.17.0.
openhands-tools: checked 2 deprecation metadata entries against version 1.17.0.
openhands-workspace: checked 1 deprecation metadata entries against version 1.17.0.
openhands-agent-server: checked 6 deprecation metadata entries against version 1.17.0.
Checked 16 deprecation metadata entries across 4 package(s).The initial failure of the deprecation deadline check was resolved by commit 9e90afa, which extended the browser_use logging workaround deadline from 1.17.0 to 1.19.0 with updated context noting that browser_use 0.11.9 still has the problematic logging code.
Build Success ✅
$ make build
Dependencies installed successfully.
Pre-commit hooks installed successfully.
Build complete! Development environment is ready.Issues Found
None. The deprecation deadline issue was already addressed in commit 9e90afa.
CI Status
Passing (relevant to release):
- ✅ Pre-commit checks
- ✅ Python API breakage check
- ✅ REST API breakage check
- ✅ Version bump guard
- ✅ Deprecation deadlines
- ✅ Check docstrings
- ✅ PR artifacts check
In Progress (expected to pass):
- ⏳ SDK/tools/agent-server unit tests
- ⏳ Build & Push jobs (multi-platform)
- ⏳ Binary build tests
The critical release-blocking checks (deprecation deadlines, version consistency, API breakage) all pass. The in-progress jobs are standard test/build jobs that typically pass and don't require manual verification.
|
Fixed in 9e90afa. The failure came from I verified the upstream The This comment was created by an AI assistant (OpenHands) on behalf of the user. |
This comment was marked as duplicate.
This comment was marked as duplicate.
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 22.9s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 18.2s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.4s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 29.2s | $0.03 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 11.7s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 30.0s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 29.3s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 14.1s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 37.3s | $0.03 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 3m 13s | $0.23 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 14.0s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 16.9s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 13.1s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 15.1s | $0.03 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 8.8s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 17.2s | $0.02 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 3s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 2m 52s | $0.20 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 49.1s | $0.06 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 21.6s | $0.03 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 28.3s | $0.04 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 29.1s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 15.4s | $0.02 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 3m 8s | $0.24 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 19.5s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 1m 58s | $0.16 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 12.4s | $0.01 |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 3.3s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 38.5s | $0.03 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 9.6s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 34.2s | $0.05 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 26.5s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 1m 5s | $0.06 |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 2.8s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ✅ PASS | 6.5s | $0.01 |
| 01_standalone_sdk/45_parallel_tool_execution.py | ✅ PASS | 2m 19s | $0.32 |
| 01_standalone_sdk/46_agent_settings.py | ✅ PASS | 10.4s | $0.01 |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 2.8s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 31.7s | $0.03 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.1s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.0s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
5m 8s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 29.5s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.1s | -- |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ❌ FAIL Exit code 1 |
5m 6s | -- |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 26.8s | $0.05 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 16.6s | $0.02 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 44.5s | $0.06 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 11.6s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 19.9s | $0.04 |
❌ Some tests failed
Total: 50 | Passed: 45 | Failed: 5 | Total Cost: $2.06
Failed examples:
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/09_acp_agent_with_remote_runtime.py: Exit code 1
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 25.1s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 21.4s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 11.5s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 36.4s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 11.3s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 42.7s | $0.03 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 37.4s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 9.5s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 32.0s | $0.03 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 3m 0s | $0.20 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 14.7s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 28.9s | $0.03 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 15.3s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 14.7s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.3s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 14.8s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 17s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 3m 50s | $0.25 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 52.3s | $0.07 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 18.0s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 36.4s | $0.04 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 42.0s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 9.1s | $0.01 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 5m 26s | $0.41 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 17.7s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 6m 35s | $0.63 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 10.6s | $0.00 |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 3.6s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 32.7s | $0.04 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 10.4s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 42.7s | $0.05 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 32.3s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 1m 43s | $0.10 |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 3.3s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ✅ PASS | 7.1s | $0.01 |
| 01_standalone_sdk/45_parallel_tool_execution.py | ✅ PASS | 2m 36s | $0.45 |
| 01_standalone_sdk/46_agent_settings.py | ✅ PASS | 12.0s | $0.01 |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 3.2s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 33.7s | $0.03 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.9s | -- |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.9s | -- |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL Exit code 1 |
5m 9s | -- |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 34.4s | $0.05 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ❌ FAIL Exit code 1 |
4.8s | -- |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ❌ FAIL Exit code 1 |
5m 7s | -- |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 27.8s | $0.05 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 17.2s | $0.01 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 29.5s | $0.06 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 11.9s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 21.5s | $0.04 |
❌ Some tests failed
Total: 50 | Passed: 45 | Failed: 5 | Total Cost: $2.93
Failed examples:
- examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1
- examples/02_remote_agent_server/09_acp_agent_with_remote_runtime.py: Exit code 1
|
@OpenHands dig in to the logs from the failed examples, are these concerning, and did they fail on previous releases? |
|
I'm on it! neubig can track my progress at all-hands.dev |
Analysis of Failed Example TestsThis analysis was generated by an AI assistant (OpenHands). SummaryNot concerning. All 5 failures are caused by a single infrastructure issue (Docker image tag mismatch), not by any code regression. These same examples all passed on both v1.16.0 and v1.15.0. The 5 Failures
Root Cause: Image Tag SHA MismatchAll 5 examples try to pull
The docker/apptainer examples (02, 03, 08) fail immediately with "manifest unknown". The remote runtime examples (04, 09) fail after ~5 minutes when the pod never starts because the image can't be pulled. Comparison with Previous Releases
Conclusion
|
1 similar comment
Analysis of Failed Example TestsThis analysis was generated by an AI assistant (OpenHands). SummaryNot concerning. All 5 failures are caused by a single infrastructure issue (Docker image tag mismatch), not by any code regression. These same examples all passed on both v1.16.0 and v1.15.0. The 5 Failures
Root Cause: Image Tag SHA MismatchAll 5 examples try to pull
The docker/apptainer examples (02, 03, 08) fail immediately with "manifest unknown". The remote runtime examples (04, 09) fail after ~5 minutes when the pod never starts because the image can't be pulled. Comparison with Previous Releases
Conclusion
|
SummaryThe PR comment asked me to investigate the failed example tests on PR #2812 (Release v1.17.0), determine if they're concerning, and check if they failed on previous releases. AnswerThe 5 failures are not concerning. They are all caused by a single CI infrastructure issue — a Docker image tag SHA mismatch — not a code regression. Root cause: All 5 failing examples ( Previous releases: All 5 of these examples passed on both v1.16.0 and v1.15.0. The failures on those releases were entirely different examples ( Overall: 45/50 examples passed, covering all standalone SDK tests, local agent server, cloud workspace, LLM-specific tools, and skills/plugins. Actions Taken
No code changes were made — this was purely an investigative/analytical task. |
Precise Diagnosis of Example Test FailuresThis comment was generated by an AI assistant (OpenHands) on behalf of the user. Verdict: Code Bug (not a race condition)This is a known code bug where Exact Code PathEvidence
Why It Worked on v1.16.0 But Not v1.17.0Both use the identical workflow code. The difference is the GitHub Actions runner image version:
GitHub's built-in The FixThere's actually an existing (unmerged) fix in commit
- GITHUB_SHA: ${{ github.event.pull_request.head.sha }}
+ AGENT_SERVER_SHA: ${{ github.event.pull_request.head.sha }}All 5 affected examples: read - github_sha = os.getenv("GITHUB_SHA")
+ server_sha = os.getenv("AGENT_SERVER_SHA")Files that need the change:
This should go in as a separate PR to |
Ruling Out a Race ConditionThis comment was generated by an AI assistant (OpenHands) on behalf of the user. Not a race condition. The images were built and pushed 24 minutes before the examples ran. The examples just computed the wrong image tag. Timeline (all UTC, April 13)
The The BugThis is a code error in 7 files. The workflow passes the PR head SHA via a variable named The # Use SDK_SHA for PR head SHA - GITHUB_SHA is a built-in that gets overwritten by checkout
SDK_SHA: ${{ github.event.pull_request.head.sha || github.sha }}The fix is to use a custom variable name (e.g. |
|
I think this release is probably OK to deploy, there are failing examples but this seems to be due to a mismatched SHA, as opposed to a real issue. |
… GITHUB_SHA GitHub Actions sets GITHUB_SHA to the merge commit for pull_request events. The step-level env override stopped working reliably with runner image ubuntu24/20260406.80. Use a custom variable name AGENT_SERVER_SHA to explicitly pass the PR head SHA to example scripts for Docker image tag resolution. This fixes the 5 docker/apptainer/remote-runtime example failures seen in the v1.17.0 release PR (#2812). Co-authored-by: openhands <openhands@all-hands.dev>
🧪 Integration Tests ResultsOverall Success Rate: 80.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_moonshot_kimi_k2_thinking
Failed Tests:
However, the agent violated the evaluation criteria by creating additional files beyond what was explicitly requested: Files Created (3 total):
The evaluation criteria explicitly state: "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script." By creating Positive Aspects:
Negative Aspects:
However, there is a material deviation from the user's explicit instructions: Critical Issue - Ignored Environment Instructions: The agent ignored this instruction and ran all Python commands directly:
These should have been executed with Positive Aspects:
Assessment: litellm_proxy_anthropic_claude_sonnet_4_6
|
|
@OpenHands help me understand why docker examples are failing? Any way to fix it? Maybe retrigger it for me? #2812 (comment) |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
The built-in GITHUB_SHA resolves to the merge-commit SHA on pull_request events, which does not match the agent-server image tags built from the PR head commit. The agent-server build workflow already uses SDK_SHA for this reason. Switch run-examples.yml to export SDK_SHA (matching the server workflow) and update all remote-agent-server examples to prefer SDK_SHA, falling back to GITHUB_SHA for backward compatibility. Co-authored-by: openhands <openhands@all-hands.dev>
Docker examples fix —Root causeThe 5 failing docker/remote-server examples all tried to pull The agent-server build workflow tags images with the PR head SHA ( Fix
I've also re-added the This comment was generated by an AI assistant (OpenHands). |
Final SummaryWas the request completely addressed?
Are the changes concise?Yes. The changeset is minimal — 7 files, 27 lines changed (all symmetric replacements). No extraneous changes were introduced:
All pre-commit checks passed (YAML format, Ruff, pycodestyle, pyright, import rules). |
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 23.8s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 19.0s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 11.8s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 39.3s | $0.02 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 13.7s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 34.2s | $0.03 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 33.3s | $0.04 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 9.6s | $0.00 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 35.9s | $0.03 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 54s | $0.20 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 16.3s | $0.01 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 26.2s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 14.1s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 15.7s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.8s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 14.7s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 23s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 4m 8s | $0.30 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 53.0s | $0.07 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 18.3s | $0.03 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 29.2s | $0.03 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 30.2s | $0.03 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 22.6s | $0.02 |
| 01_standalone_sdk/31_iterative_refinement.py | ✅ PASS | 3m 59s | $0.29 |
| 01_standalone_sdk/32_configurable_security_policy.py | ✅ PASS | 20.1s | $0.02 |
| 01_standalone_sdk/34_critic_example.py | ✅ PASS | 6m 40s | $0.57 |
| 01_standalone_sdk/36_event_json_to_openai_messages.py | ✅ PASS | 13.0s | $0.01 |
| 01_standalone_sdk/37_llm_profile_store/main.py | ✅ PASS | 6.4s | $0.00 |
| 01_standalone_sdk/38_browser_session_recording.py | ✅ PASS | 34.7s | $0.03 |
| 01_standalone_sdk/39_llm_fallback.py | ✅ PASS | 11.2s | $0.01 |
| 01_standalone_sdk/40_acp_agent_example.py | ✅ PASS | 35.0s | $0.21 |
| 01_standalone_sdk/41_task_tool_set.py | ✅ PASS | 35.3s | $0.03 |
| 01_standalone_sdk/42_file_based_subagents.py | ✅ PASS | 1m 29s | $0.09 |
| 01_standalone_sdk/43_mixed_marketplace_skills/main.py | ✅ PASS | 6.5s | $0.00 |
| 01_standalone_sdk/44_model_switching_in_convo.py | ✅ PASS | 7.6s | $0.01 |
| 01_standalone_sdk/45_parallel_tool_execution.py | ✅ PASS | 3m 17s | $0.41 |
| 01_standalone_sdk/46_agent_settings.py | ✅ PASS | 10.1s | $0.01 |
| 01_standalone_sdk/47_defense_in_depth_security.py | ✅ PASS | 3.3s | $0.00 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 42.2s | $0.03 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 34s | $0.07 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 52.7s | $0.06 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 24s | $0.02 |
| 02_remote_agent_server/07_convo_with_cloud_workspace.py | ✅ PASS | 30.9s | $0.03 |
| 02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py | ✅ PASS | 3m 15s | $0.03 |
| 02_remote_agent_server/09_acp_agent_with_remote_runtime.py | ✅ PASS | 53.4s | $0.24 |
| 02_remote_agent_server/10_cloud_workspace_share_credentials.py | ✅ PASS | 28.4s | $0.05 |
| 04_llm_specific_tools/01_gpt5_apply_patch_preset.py | ✅ PASS | 27.7s | $0.03 |
| 04_llm_specific_tools/02_gemini_file_tools.py | ✅ PASS | 39.0s | $0.09 |
| 05_skills_and_plugins/01_loading_agentskills/main.py | ✅ PASS | 17.8s | $0.01 |
| 05_skills_and_plugins/02_loading_plugins/main.py | ✅ PASS | 21.3s | $0.02 |
✅ All tests passed!
Total: 50 | Passed: 50 | Failed: 0 | Total Cost: $3.31
|
Evaluation Triggered
|
Release v1.17.0
This PR prepares the release for version 1.17.0.
Release Checklist
integration-test)behavior-test)test-examples)v1.17.0rel-1.17.0Next Steps
Once the release is published on GitHub, the PyPI packages will be automatically published via the
pypi-release.ymlworkflow.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:aabf407-pythonRun
All tags pushed for this build
About Multi-Architecture Support
aabf407-python) is a multi-arch manifest supporting both amd64 and arm64aabf407-python-amd64) are also available if needed