fix(tools): return browser timeout as observation by neubig · Pull Request #2455 · OpenHands/software-agent-sdk

neubig · 2026-03-15T20:55:04Z

Summary

return browser action timeouts from BrowserToolExecutor.__call__ as BrowserObservation errors instead of bubbling up a fatal conversation TimeoutError
normalize empty browser exceptions into readable Browser operation failed: ... messages
keep regression coverage for timeout-to-observation behavior while removing the undocumented demo example that was breaking check-examples

Changes Made

added _format_browser_operation_error() so blank exceptions and timeout errors produce stable observation text
made BrowserToolExecutor use a configurable action_timeout_seconds, validate it, and catch builtins.TimeoutError in __call__
added focused regression coverage in tests/tools/browser_use/test_browser_executor.py, including a live slow-service timeout case
removed examples/01_standalone_sdk/45_browser_timeout_observation.py and its now-unneeded exclusion entry because undocumented examples fail the docs sync check

Testing

uv run pre-commit run --files tests/examples/test_examples.py
uv run pytest tests/tools/browser_use/test_browser_executor.py -q
uv run pytest tests/tools/browser_use/test_browser_executor.py -k live_service -vv
DOCS_PATH=/workspace/project/software-agent-sdk/.agent_tmp/docs-check uv run python .github/scripts/check_documented_examples.py

Evidence

Live local SDK run with an actual LLM (exact `BrowserToolExecutor.call` timeout path)

$ OPENHANDS_SUPPRESS_BANNER=1 .venv/bin/python .agent_tmp/browser-timeout-live/local_live_timeout_demo.py
live_run_elapsed_seconds=13.51
=== relevant_events ===
ACTION browser_navigate
{
  "url": "https://work-2-gdhkbsxlnrlyosok.prod-runtime.all-hands.dev/",
  "new_tab": false,
  "kind": "BrowserNavigateAction"
}
OBSERVATION browser_navigate
{
  "content": [
    {
      "cache_prompt": false,
      "type": "text",
      "text": "Browser operation failed: Operation timed out after 2 seconds"
    }
  ],
  "is_error": true,
  "kind": "BrowserObservation"
}
ACTION terminal
{
  "command": "pwd",
  "kind": "TerminalAction"
}
OBSERVATION terminal
{
  "content": [
    {
      "cache_prompt": false,
      "type": "text",
      "text": "/workspace/project/software-agent-sdk"
    }
  ],
  "is_error": false,
  "kind": "TerminalObservation"
}
ACTION finish
{
  "message": "RECOVERED AFTER TIMEOUT",
  "kind": "FinishAction"
}
OBSERVATION finish
{
  "content": [
    {
      "cache_prompt": false,
      "type": "text",
      "text": "RECOVERED AFTER TIMEOUT"
    }
  ],
  "is_error": false,
  "kind": "FinishObservation"
}
EXAMPLE_COST: 0.01867935

Hosted OpenHands Cloud conversation (real hosted agent recovery)

Conversation: https://app.all-hands.dev/conversations/f4b7a646ad6241e1bfe90ac206019391

$ .venv/bin/python - <<'PY'
# summarized the hosted conversation via the OpenHands Cloud API
PY
conversation_link=https://app.all-hands.dev/conversations/f4b7a646ad6241e1bfe90ac206019391
execution_status=finished
browser_navigate_elapsed=16.6s
browser_get_state_elapsed=30.0s
browser_get_state_observation=
Browser operation failed:
terminal_pwd_observation=
/workspace/project/software-agent-sdk
finish_observation=
Browser observation received exactly as:
"[An error occurred during execution.]"
"Browser operation failed: "

Then I ran `pwd` successfully and got `/workspace/project/software-agent-sdk`.

RECOVERED AFTER TIMEOUT

The local SDK run above uses a real LLM plus BrowserToolSet(action_timeout_seconds=2.0) to exercise the exact timeout-to-BrowserObservation path on this branch. The hosted OpenHands Cloud conversation shows a real hosted agent also recovered from a browser hang observation and continued with pwd/finish instead of crashing the conversation.

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works? (Ran the self-contained timeout demo locally before removing it from the PR because it broke check-examples.)
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name? (Not applicable: this PR no longer adds a shipped example or docs surface.)
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:1d59870-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-1d59870-python \
  ghcr.io/openhands/agent-server:1d59870-python

All tags pushed for this build

ghcr.io/openhands/agent-server:1d59870-golang-amd64
ghcr.io/openhands/agent-server:1d59870-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:1d59870-golang-arm64
ghcr.io/openhands/agent-server:1d59870-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:1d59870-java-amd64
ghcr.io/openhands/agent-server:1d59870-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:1d59870-java-arm64
ghcr.io/openhands/agent-server:1d59870-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:1d59870-python-amd64
ghcr.io/openhands/agent-server:1d59870-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:1d59870-python-arm64
ghcr.io/openhands/agent-server:1d59870-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:1d59870-golang
ghcr.io/openhands/agent-server:1d59870-java
ghcr.io/openhands/agent-server:1d59870-python

About Multi-Architecture Support

Each variant tag (e.g., 1d59870-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 1d59870-python-amd64) are also available if needed

Catch browser action timeouts at the browser tool boundary so a hung browser call returns a BrowserObservation error instead of bubbling up as a fatal conversation TimeoutError. Also format empty browser exceptions more clearly to avoid blank 'Browser operation failed:' messages. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-15T20:55:33Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-03-15T20:55:37Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

Bump openhands-agent-server from 1.14.0 to 1.15.0 so the REST API breakage workflow reflects the current breaking API surface already present on main. This is intentionally separate from PR #2455, whose failing REST API check is unrelated to the browser timeout fix. Co-authored-by: openhands <openhands@all-hands.dev>

neubig · 2026-03-16T11:51:28Z

@OpenHands test this life by creating a web service that does not respond in the allotted time and creating an example following the examples folder that asks the agent to query this web service. Run the example before and after this fix and demonstrate that it's better. Fix any failing tests as well

openhands-ai · 2026-03-16T11:51:45Z

I'm on it! neubig can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>

neubig · 2026-03-16T13:57:51Z

Addressed in 4010efb.

What changed:

kept browser action timeouts on the tool path as normal BrowserObservation errors
added a live slow-service regression test in tests/tools/browser_use/test_browser_executor.py
added examples/01_standalone_sdk/45_browser_timeout_observation.py, a self-contained example that starts a slow web service and has an agent call browser_navigate against it
excluded the new example from the generic example test sweep because it intentionally runs its own live timeout scenario

Validation:

uv run pre-commit run --files openhands-tools/openhands/tools/browser_use/impl.py tests/tools/browser_use/test_browser_executor.py tests/examples/test_examples.py examples/01_standalone_sdk/45_browser_timeout_observation.py
uv run pytest tests/tools/browser_use/test_browser_executor.py -q
uv run pytest tests/examples/test_examples.py -q
uv run python examples/01_standalone_sdk/45_browser_timeout_observation.py

Before/after demo using the new example:

legacy behavior (monkeypatched old BrowserToolExecutor.__call__): the example aborts with a fatal ConversationRunError caused by TimeoutError
current behavior: the agent receives Browser operation failed: Operation timed out after 2 seconds, then cleanly finishes with The slow web service timed out, but the browser tool returned a normal error observation instead of crashing the conversation.

I also checked local targeted tests after the rebase; the new regression coverage is green.

openhands-ai · 2026-03-16T13:58:51Z

Done — the new changes since the last summary are complete and pushed.

Checklist:

Added a live slow-service regression test in tests/tools/browser_use/test_browser_executor.py that verifies a real timeout is returned as a normal BrowserObservation error.
Added a standalone, self-contained example at examples/01_standalone_sdk/45_browser_timeout_observation.py.
Demonstrated before/after behavior with that example:
- before/flegacy behavior: fatal ConversationRunError from TimeoutError
- after/current behavior: tool returns Browser operation failed: Operation timed out after 2 seconds, and the conversation finishes normally
Updated tests/examples/test_examples.py to exclude the new example from the generic example sweep since it intentionally runs a live timeout scenario.
Ran targeted validation and pre-commit successfully.
Committed and pushed to the PR branch.

Conciseness check:

Changes are focused to the reviewer request.
No extra files beyond the requested example, test coverage, and necessary test-list update.

Validation run:

uv run pre-commit run --files openhands-tools/openhands/tools/browser_use/impl.py tests/tools/browser_use/test_browser_executor.py tests/examples/test_examples.py examples/01_standalone_sdk/45_browser_timeout_observation.py
uv run pytest tests/tools/browser_use/test_browser_executor.py -q
uv run pytest tests/examples/test_examples.py -q
uv run python examples/01_standalone_sdk/45_browser_timeout_observation.py

Pushed commit:

4010efb7a04159930d0587a8505d6602a1a1fa8f

PR update comment:

fix(tools): return browser timeout as observation #2455 (comment)

github-actions · 2026-03-16T14:02:12Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-tools/openhands/tools/browser_use
impl.py	255	157	38%	57–61, 63–64, 66, 68, 70–73, 75–76, 78, 80, 87, 99–102, 108–109, 114, 116–118, 120–121, 129–131, 133–137, 142, 193, 198–201, 203, 225–227, 230–232, 234, 247, 284–285, 289, 299, 314–315, 320, 334–335, 340–341, 351, 369–370, 372–385, 388–401, 403–404, 410, 415–418, 426, 428, 431–432, 438–439, 444–445, 451–452, 456–457, 461–462, 466, 468–469, 471–474, 477–478, 484, 486, 488, 496–497, 501–502, 507–508, 512–513, 517–518, 523–524, 536–537, 548–549, 553–557, 571–573, 578, 583–584, 593–594
TOTAL	20591	10390	49%

Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟢 Good taste - Clean, minimal fix that converts fatal timeouts into recoverable observations.

⚠️ Eval Risk Flag: This changes tool execution behavior (timeouts → observations vs fatal errors). Needs lightweight eval verification before merge to ensure agent behavior isn't negatively impacted.

Verdict: ✅ Code is solid and well-tested. Waiting on eval confirmation.

all-hands-bot

Taste Rating: 🟢 Good taste - Exactly the right approach.

Key Insight: This eliminates a special case (fatal timeout) by normalizing it into the standard error observation flow. The core change is 3 lines in __call__: catch TimeoutError, return BrowserObservation. Simple, direct, and well-tested with a real slow HTTP service.

Verdict: ✅ Code is solid and ready. Waiting on eval verification (as noted in previous review) since this changes tool execution behavior.

VascoSch92

Just one suggestion for a change.
Otherwise, LGTM.

neubig · 2026-03-19T22:01:08Z

@OpenHands fix the remaining review comment

openhands-ai · 2026-03-19T22:01:30Z

I'm on it! neubig can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-03-19T22:04:41Z

Done. New work since the last substantive update:

Addressed the remaining unresolved review comment in openhands-tools/openhands/tools/browser_use/impl.py
Kept the change minimal and behavior-preserving: _format_browser_operation_error() was only refactored for readability/simplicity
Ran validation for the touched area:
- uv run pre-commit run --files openhands-tools/openhands/tools/browser_use/impl.py
- uv run pytest tests/tools/browser_use/test_browser_executor.py -q
Committed and pushed the change:
- 34db45a4 — refactor(tools): simplify browser error formatting
Replied to the GitHub review thread via GraphQL
Marked that review thread as resolved via GraphQL
Re-ran the Review Thread Gate workflow
Confirmed the latest Review Thread Gate run passed

Conciseness check:

The update is concise.
No extraneous files or unrelated code were changed.
Only the reviewed helper function was refactored, with no scope creep.

In short: the remaining review feedback has been fully addressed, the thread was resolved, and the review-thread CI gate now passes.

neubig mentioned this pull request Mar 15, 2026

chore(agent-server): bump version for REST API breakage checks #2456

Closed

5 tasks

enyst and others added 2 commits March 16, 2026 14:54

Merge branch 'main' into fix/browser-timeout-observation

54381d6

fix(tools): add live browser timeout demo

4010efb

Co-authored-by: openhands <openhands@all-hands.dev>

fix(ci): remove undocumented browser timeout example

9fbd441

Co-authored-by: openhands <openhands@all-hands.dev>

neubig marked this pull request as ready for review March 19, 2026 02:43

all-hands-bot reviewed Mar 19, 2026

View reviewed changes

Comment thread openhands-tools/openhands/tools/browser_use/impl.py

Comment thread openhands-tools/openhands/tools/browser_use/impl.py

Comment thread tests/tools/browser_use/test_browser_executor.py

Comment thread openhands-tools/openhands/tools/browser_use/impl.py

neubig marked this pull request as draft March 19, 2026 04:37

neubig marked this pull request as ready for review March 19, 2026 11:42

all-hands-bot reviewed Mar 19, 2026

View reviewed changes

VascoSch92 approved these changes Mar 19, 2026

View reviewed changes

Comment thread openhands-tools/openhands/tools/browser_use/impl.py

Merge branch 'main' into fix/browser-timeout-observation

b6bf911

refactor(tools): simplify browser error formatting

34db45a

Co-authored-by: openhands <openhands@all-hands.dev>

neubig merged commit ba70d15 into main Mar 19, 2026
30 checks passed

neubig deleted the fix/browser-timeout-observation branch March 19, 2026 23:41

Conversation

neubig commented Mar 15, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Testing

Evidence

Live local SDK run with an actual LLM (exact BrowserToolExecutor.__call__ timeout path)

Hosted OpenHands Cloud conversation (real hosted agent recovery)

Checklist

Uh oh!

github-actions bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

neubig commented Mar 16, 2026

Uh oh!

openhands-ai bot commented Mar 16, 2026

Uh oh!

neubig commented Mar 16, 2026

Uh oh!

openhands-ai bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

VascoSch92 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

neubig commented Mar 19, 2026

Uh oh!

openhands-ai bot commented Mar 19, 2026

Uh oh!

openhands-ai bot commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

neubig commented Mar 15, 2026 •

edited by github-actions bot

Loading

Live local SDK run with an actual LLM (exact `BrowserToolExecutor.call` timeout path)

github-actions bot commented Mar 15, 2026 •

edited

Loading

github-actions bot commented Mar 15, 2026 •

edited

Loading

github-actions bot commented Mar 16, 2026 •

edited

Loading