Skip to content

Commit 487da93

Browse files
authored
Merge branch 'OpenHands:main' into fix/2467-image-downscale
2 parents 4d047ed + 1366a0b commit 487da93

File tree

75 files changed

+6181
-248
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+6181
-248
lines changed
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
---
2+
name: cross-repo-testing
3+
description: This skill should be used when the user asks to "test a saas cross-repo feature", "deploy a feature branch to staging", "test SDK against OH Cloud branch", "e2e test a cloud workspace feature", "test secrets saas inheritance", or when changes span the SDK and OpenHands enterprise and need end-to-end validation against a staging deployment.
4+
---
5+
6+
# Cross-Repo Testing: SDK ↔ OpenHands Cloud
7+
8+
How to end-to-end test features that span `OpenHands/software-agent-sdk` and `OpenHands/OpenHands` (the Cloud backend).
9+
10+
## Repository Map
11+
12+
| Repo | Role | What lives here |
13+
|------|------|-----------------|
14+
| [`software-agent-sdk`](https://github.com/OpenHands/software-agent-sdk) | Agent core | `openhands-sdk`, `openhands-workspace`, `openhands-tools` packages. `OpenHandsCloudWorkspace` lives here. |
15+
| [`OpenHands`](https://github.com/OpenHands/OpenHands) | Cloud backend | FastAPI server (`openhands/app_server/`), sandbox management, auth, enterprise integrations. Deployed as OH Cloud. |
16+
| [`deploy`](https://github.com/OpenHands/deploy) | Infrastructure | Helm charts + GitHub Actions that build the enterprise Docker image and deploy to staging/production. |
17+
18+
**Data flow:** SDK client → OH Cloud API (`/api/v1/...`) → sandbox agent-server (inside runtime container)
19+
20+
## When You Need This
21+
22+
There are **two flows** depending on which direction the dependency goes:
23+
24+
| Flow | When | Example |
25+
|------|------|---------|
26+
| **A — SDK client → new Cloud API** | The SDK calls an API that doesn't exist yet on production | `workspace.get_llm()` calling `GET /api/v1/users/me?expose_secrets=true` |
27+
| **B — OH server → new SDK code** | The Cloud server needs unreleased SDK packages or a new agent-server image | Server consumes a new tool, agent behavior, or workspace method from the SDK |
28+
29+
Flow A only requires deploying the server PR. Flow B requires pinning the SDK to an unreleased commit in the server PR **and** using the SDK PR's agent-server image. Both flows may apply simultaneously.
30+
31+
---
32+
33+
## Flow A: SDK Client Tests Against New Cloud API
34+
35+
Use this when the SDK calls an endpoint that only exists on the server PR branch.
36+
37+
### A1. Write and test the server-side changes
38+
39+
In the `OpenHands` repo, implement the new API endpoint(s). Run unit tests:
40+
41+
```bash
42+
cd OpenHands
43+
poetry run pytest tests/unit/app_server/test_<relevant>.py -v
44+
```
45+
46+
Push a PR. Wait for the **"Push Enterprise Image" (Docker) CI job** to succeed — this builds `ghcr.io/openhands/enterprise-server:sha-<COMMIT>`.
47+
48+
### A2. Write the SDK-side changes
49+
50+
In `software-agent-sdk`, implement the client code (e.g., new methods on `OpenHandsCloudWorkspace`). Run SDK unit tests:
51+
52+
```bash
53+
cd software-agent-sdk
54+
pip install -e openhands-sdk -e openhands-workspace
55+
pytest tests/ -v
56+
```
57+
58+
Push a PR. SDK CI is independent — it doesn't need the server changes to pass unit tests.
59+
60+
### A3. Deploy the server PR to staging
61+
62+
See [Deploying to a Staging Feature Environment](#deploying-to-a-staging-feature-environment) below.
63+
64+
### A4. Run the SDK e2e test against staging
65+
66+
See [Running E2E Tests Against Staging](#running-e2e-tests-against-staging) below.
67+
68+
---
69+
70+
## Flow B: OH Server Needs Unreleased SDK Code
71+
72+
Use this when the Cloud server depends on SDK changes that haven't been released to PyPI yet. The server's runtime containers run the `agent-server` image built from the SDK repo, so the server PR must be configured to use the SDK PR's image and packages.
73+
74+
### B1. Get the SDK PR merged (or identify the commit)
75+
76+
The SDK PR must have CI pass so its agent-server Docker image is built. The image is tagged with the **merge-commit SHA** from GitHub Actions — NOT the head-commit SHA shown in the PR.
77+
78+
Find the correct image tag:
79+
- Check the SDK PR description for an `AGENT_SERVER_IMAGES` section
80+
- Or check the "Consolidate Build Information" CI job for `"short_sha": "<tag>"`
81+
82+
### B2. Pin SDK packages to the commit in the OpenHands PR
83+
84+
In the `OpenHands` repo PR, update 3 files + regenerate 3 lock files (see the `update-sdk` skill for full details):
85+
86+
**`pyproject.toml`** — pin all 3 SDK packages in **both** `dependencies` and `[tool.poetry.dependencies]`:
87+
```toml
88+
# dependencies array (PEP 508)
89+
"openhands-sdk @ git+https://github.com/OpenHands/software-agent-sdk.git@<COMMIT>#subdirectory=openhands-sdk",
90+
"openhands-agent-server @ git+https://github.com/OpenHands/software-agent-sdk.git@<COMMIT>#subdirectory=openhands-agent-server",
91+
"openhands-tools @ git+https://github.com/OpenHands/software-agent-sdk.git@<COMMIT>#subdirectory=openhands-tools",
92+
93+
# [tool.poetry.dependencies]
94+
openhands-sdk = { git = "https://github.com/OpenHands/software-agent-sdk.git", rev = "<COMMIT>", subdirectory = "openhands-sdk" }
95+
openhands-agent-server = { git = "https://github.com/OpenHands/software-agent-sdk.git", rev = "<COMMIT>", subdirectory = "openhands-agent-server" }
96+
openhands-tools = { git = "https://github.com/OpenHands/software-agent-sdk.git", rev = "<COMMIT>", subdirectory = "openhands-tools" }
97+
```
98+
99+
**`openhands/app_server/sandbox/sandbox_spec_service.py`** — use the SDK's merge-commit SHA:
100+
```python
101+
AGENT_SERVER_IMAGE = 'ghcr.io/openhands/agent-server:<merge-commit-sha>-python'
102+
```
103+
104+
**Regenerate lock files:**
105+
```bash
106+
poetry lock && uv lock && cd enterprise && poetry lock && cd ..
107+
```
108+
109+
### B3. Wait for the OpenHands enterprise image to build
110+
111+
Push the pinned changes. The OpenHands CI will build a new enterprise Docker image (`ghcr.io/openhands/enterprise-server:sha-<OH_COMMIT>`) that bundles the unreleased SDK. Wait for the "Push Enterprise Image" job to succeed.
112+
113+
### B4. Deploy and test
114+
115+
Follow [Deploying to a Staging Feature Environment](#deploying-to-a-staging-feature-environment) using the new OpenHands commit SHA.
116+
117+
### B5. Before merging: remove the pin
118+
119+
**CI guard:** `check-package-versions.yml` blocks merge to `main` if `[tool.poetry.dependencies]` contains `rev` fields. Before the OpenHands PR can merge, the SDK PR must be merged and released to PyPI, then the pin must be replaced with the released version number.
120+
121+
---
122+
123+
## Deploying to a Staging Feature Environment
124+
125+
The `deploy` repo creates preview environments from OpenHands PRs.
126+
127+
**Option A — GitHub Actions UI (preferred):**
128+
Go to `OpenHands/deploy` → Actions → "Create OpenHands preview PR" → enter the OpenHands PR number. This creates a branch `ohpr-<PR>-<random>` and opens a deploy PR.
129+
130+
**Option B — Update an existing feature branch:**
131+
```bash
132+
cd deploy
133+
git checkout ohpr-<PR>-<random>
134+
# In .github/workflows/deploy.yaml, update BOTH:
135+
# OPENHANDS_SHA: "<full-40-char-commit>"
136+
# OPENHANDS_RUNTIME_IMAGE_TAG: "<same-commit>-nikolaik"
137+
git commit -am "Update OPENHANDS_SHA to <commit>" && git push
138+
```
139+
140+
**Before updating the SHA**, verify the enterprise Docker image exists:
141+
```bash
142+
gh api repos/OpenHands/OpenHands/actions/runs \
143+
--jq '.workflow_runs[] | select(.head_sha=="<COMMIT>") | "\(.name): \(.conclusion)"' \
144+
| grep Docker
145+
# Must show: "Docker: success"
146+
```
147+
148+
The deploy CI auto-triggers and creates the environment at:
149+
```
150+
https://ohpr-<PR>-<random>.staging.all-hands.dev
151+
```
152+
153+
**Wait for it to be live:**
154+
```bash
155+
curl -s -o /dev/null -w "%{http_code}" https://ohpr-<PR>-<random>.staging.all-hands.dev/api/v1/health
156+
# 401 = server is up (auth required). DNS may take 1-2 min on first deploy.
157+
```
158+
159+
## Running E2E Tests Against Staging
160+
161+
**Critical: Feature deployments have their own Keycloak instance.** API keys from `app.all-hands.dev` or `$OPENHANDS_API_KEY` will NOT work. You need a test API key for the specific feature deployment. The user must provide one.
162+
163+
```python
164+
from openhands.workspace import OpenHandsCloudWorkspace
165+
166+
STAGING = "https://ohpr-<PR>-<random>.staging.all-hands.dev"
167+
168+
with OpenHandsCloudWorkspace(
169+
cloud_api_url=STAGING,
170+
cloud_api_key="<test-api-key-for-this-deployment>",
171+
) as workspace:
172+
# Test the new feature
173+
llm = workspace.get_llm()
174+
secrets = workspace.get_secrets()
175+
print(f"LLM: {llm.model}, secrets: {list(secrets.keys())}")
176+
```
177+
178+
Or run an example script:
179+
```bash
180+
OPENHANDS_CLOUD_API_KEY="<key>" \
181+
OPENHANDS_CLOUD_API_URL="https://ohpr-<PR>-<random>.staging.all-hands.dev" \
182+
python examples/02_remote_agent_server/10_cloud_workspace_saas_credentials.py
183+
```
184+
185+
### Recording results
186+
187+
Push test output to the SDK PR's `.pr/logs/` directory:
188+
```bash
189+
cd software-agent-sdk
190+
python test_script.py 2>&1 | tee .pr/logs/<test_name>.log
191+
git add -f .pr/logs/<test_name>.log .pr/README.md
192+
git commit -m "docs: add e2e test results" && git push
193+
```
194+
195+
Comment on **both PRs** with pass/fail summary and link to logs.
196+
197+
## Key Gotchas
198+
199+
| Gotcha | Details |
200+
|--------|---------|
201+
| **Feature env auth is isolated** | Each `ohpr-*` deployment has its own Keycloak. Production API keys don't work. |
202+
| **Two SHAs in deploy.yaml** | `OPENHANDS_SHA` and `OPENHANDS_RUNTIME_IMAGE_TAG` must both be updated. The runtime tag is `<sha>-nikolaik`. |
203+
| **Enterprise image must exist** | The Docker CI job on the OpenHands PR must succeed before you can deploy. If it hasn't run, push an empty commit to trigger it. |
204+
| **DNS propagation** | First deployment of a new branch takes 1-2 min for DNS. Subsequent deploys are instant. |
205+
| **Merge-commit SHA ≠ head SHA** | SDK CI tags Docker images with GitHub Actions' merge-commit SHA, not the PR head SHA. Check the SDK PR description or CI logs for the correct tag. |
206+
| **SDK pin blocks merge** | `check-package-versions.yml` prevents merging an OpenHands PR that has `rev` fields in `[tool.poetry.dependencies]`. The SDK must be released to PyPI first. |
207+
| **Flow A: stock agent-server is fine** | When only the Cloud API changes, `OpenHandsCloudWorkspace` talks to the Cloud server, not the agent-server. No custom image needed. |
208+
| **Flow B: agent-server image is required** | When the server needs new SDK code inside runtime containers, you must pin to the SDK PR's agent-server image. |

.github/run-eval/resolve_model_config.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,14 @@ def _sigterm_handler(signum: int, _frame: object) -> None:
8888
"temperature": 0.0,
8989
},
9090
},
91+
"qwen3.6-plus": {
92+
"id": "qwen3.6-plus",
93+
"display_name": "Qwen3.6 Plus",
94+
"llm_config": {
95+
"model": "litellm_proxy/dashscope/qwen3.6-plus",
96+
"temperature": 0.0,
97+
},
98+
},
9199
"claude-4.5-opus": {
92100
"id": "claude-4.5-opus",
93101
"display_name": "Claude 4.5 Opus",

.github/workflows/run-examples.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
steps:
2929
- name: Wait for agent server to finish build
3030
if: github.event_name == 'pull_request'
31-
uses: lewagon/wait-on-check-action@v1.5.0
31+
uses: lewagon/wait-on-check-action@v1.6.0
3232
with:
3333
ref: ${{ github.event.pull_request.head.ref }}
3434
check-name: Build & Push (python-amd64)

.github/workflows/version-bump-prs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ jobs:
337337
echo "- [OpenHands-CLI](https://github.com/OpenHands/openhands-cli/pulls?q=is%3Apr+bump-sdk-$VERSION)" >> $GITHUB_STEP_SUMMARY
338338
339339
- name: Notify Slack
340-
uses: slackapi/slack-github-action@v2.1.1
340+
uses: slackapi/slack-github-action@v3.0.1
341341
with:
342342
method: chat.postMessage
343343
token: ${{ secrets.SLACK_BOT_TOKEN }}

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,9 +106,11 @@ When reviewing code, provide constructive feedback:
106106
- `SettingsFieldSchema` intentionally does not export a `required` flag. If a consumer needs nullability semantics, inspect the underlying Python typing rather than inferring from SDK defaults.
107107
- `AgentSettings.tools` is part of the exported settings schema so the schema stays aligned with the settings payload that round-trips through `AgentSettings` and drives `create_agent()`.
108108
- `AgentSettings.mcp_config` now uses FastMCP's typed `MCPConfig` at runtime. When serializing settings back to plain data (e.g. `model_dump()` or `create_agent()`), keep the output compact with `exclude_none=True, exclude_defaults=True` so callers still see the familiar `.mcp.json`-style dict shape.
109+
- Anthropic malformed tool-use/tool-result history errors (for example, missing or duplicated ``tool_result`` blocks) are intentionally mapped to a dedicated `LLMMalformedConversationHistoryError` and caught separately in `Agent.step()`, so recovery can still use condensation while logs preserve that this was malformed history rather than a true context-window overflow.
109110
- AgentSkills progressive disclosure goes through `AgentContext.get_system_message_suffix()` into `<available_skills>`, and `openhands.sdk.context.skills.to_prompt()` truncates each prompt description to 1024 characters because the AgentSkills specification caps `description` at 1-1024 characters.
110111
- Workspace-wide uv resolver guardrails belong in the repository root `[tool.uv]` table. When `exclude-newer` is configured there, `uv lock` persists it into the root `uv.lock` `[options]` section as both an absolute cutoff and `exclude-newer-span`, and `uv sync --frozen` continues to use that locked workspace state.
111112
- `pr-review-by-openhands` delegates to `OpenHands/extensions/plugins/pr-review@main`. Repo-specific reviewer instructions live in `.agents/skills/custom-codereview-guide.md`, and because task-trigger matching is substring-based, that `/codereview` skill is also auto-injected for the workflow's `/codereview-roasted` prompt.
113+
- Auto-title generation should not re-read `ConversationState.events` from a background task triggered by a freshly received `MessageEvent`; extract message text synchronously from the incoming event and then reuse shared title helpers (`extract_message_text`, `generate_title_from_message`) to avoid persistence-order races.
112114

113115

114116
## Package-specific guidance

README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,11 +132,17 @@ For development setup, testing, and contribution guidelines, see [DEVELOPMENT.md
132132
url={https://arxiv.org/abs/2511.03690},
133133
}
134134
```
135+
<hr>
136+
137+
### Thank You to Our Contributors
138+
139+
[![Contributors](https://assets.openhands.dev/readme/openhands-software-agent-sdk-contributors.svg)](https://github.com/OpenHands/software-agent-sdk/graphs/contributors)
135140

136141
<hr>
137142

143+
### Trusted by Engineers at
144+
138145
<div align="center">
139-
<strong>Trusted by engineers at</strong>
140146
<br/><br/>
141147
<picture>
142148
<source media="(prefers-color-scheme: dark)" srcset="https://assets.openhands.dev/logos/external/white/tiktok.svg">
@@ -187,3 +193,4 @@ For development setup, testing, and contribution guidelines, see [DEVELOPMENT.md
187193
<img src="https://assets.openhands.dev/logos/external/black/google.svg" alt="Google" height="17" hspace="5">
188194
</picture>
189195
</div>
196+

examples/01_standalone_sdk/40_acp_agent_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
from openhands.sdk.conversation import Conversation
2020

2121

22-
agent = ACPAgent(acp_command=["npx", "-y", "@zed-industries/claude-agent-acp"])
22+
agent = ACPAgent(acp_command=["npx", "-y", "@agentclientprotocol/claude-agent-acp"])
2323

2424
try:
2525
cwd = os.getcwd()
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
"""Defense-in-Depth Security: composing local analyzers with ConfirmRisky.
2+
3+
This example demonstrates how to wire the defense-in-depth analyzer family
4+
into a conversation. The analyzers classify agent actions at the action
5+
boundary; the confirmation policy decides whether to prompt the user.
6+
7+
Analyzer selection does not automatically change confirmation policy --
8+
you must configure both explicitly.
9+
"""
10+
11+
from openhands.sdk.security import (
12+
ConfirmRisky,
13+
EnsembleSecurityAnalyzer,
14+
PatternSecurityAnalyzer,
15+
PolicyRailSecurityAnalyzer,
16+
SecurityRisk,
17+
)
18+
19+
20+
# Create the analyzer ensemble
21+
security_analyzer = EnsembleSecurityAnalyzer(
22+
analyzers=[
23+
PolicyRailSecurityAnalyzer(),
24+
PatternSecurityAnalyzer(),
25+
]
26+
)
27+
28+
# Confirmation policy: prompt the user for HIGH-risk actions
29+
confirmation_policy = ConfirmRisky(threshold=SecurityRisk.HIGH)
30+
31+
# Wire into a conversation:
32+
#
33+
# conversation = Conversation(agent=agent, workspace=".")
34+
# conversation.set_security_analyzer(security_analyzer)
35+
# conversation.set_confirmation_policy(confirmation_policy)
36+
#
37+
# Every agent action now passes through the analyzer.
38+
# HIGH -> confirmation prompt. MEDIUM/LOW -> allowed.
39+
# UNKNOWN -> confirmed by default (confirm_unknown=True).
40+
#
41+
# For stricter environments, lower the threshold:
42+
# confirmation_policy = ConfirmRisky(threshold=SecurityRisk.MEDIUM)
43+
44+
print("Defense-in-depth security analyzer configured.")
45+
print(f"Analyzer: {security_analyzer}")
46+
print(f"Confirmation policy: {confirmation_policy}")
47+
print("EXAMPLE_COST: 0")

0 commit comments

Comments
 (0)