Skip to content

fix(tools): return browser timeout as observation#2455

Merged
neubig merged 6 commits intomainfrom
fix/browser-timeout-observation
Mar 19, 2026
Merged

fix(tools): return browser timeout as observation#2455
neubig merged 6 commits intomainfrom
fix/browser-timeout-observation

Conversation

@neubig
Copy link
Copy Markdown
Contributor

@neubig neubig commented Mar 15, 2026

Summary

  • return browser action timeouts from BrowserToolExecutor.__call__ as BrowserObservation errors instead of bubbling up a fatal conversation TimeoutError
  • normalize empty browser exceptions into readable Browser operation failed: ... messages
  • keep regression coverage for timeout-to-observation behavior while removing the undocumented demo example that was breaking check-examples

Changes Made

  • added _format_browser_operation_error() so blank exceptions and timeout errors produce stable observation text
  • made BrowserToolExecutor use a configurable action_timeout_seconds, validate it, and catch builtins.TimeoutError in __call__
  • added focused regression coverage in tests/tools/browser_use/test_browser_executor.py, including a live slow-service timeout case
  • removed examples/01_standalone_sdk/45_browser_timeout_observation.py and its now-unneeded exclusion entry because undocumented examples fail the docs sync check

Testing

  • uv run pre-commit run --files tests/examples/test_examples.py
  • uv run pytest tests/tools/browser_use/test_browser_executor.py -q
  • uv run pytest tests/tools/browser_use/test_browser_executor.py -k live_service -vv
  • DOCS_PATH=/workspace/project/software-agent-sdk/.agent_tmp/docs-check uv run python .github/scripts/check_documented_examples.py

Evidence

Live local SDK run with an actual LLM (exact BrowserToolExecutor.__call__ timeout path)

$ OPENHANDS_SUPPRESS_BANNER=1 .venv/bin/python .agent_tmp/browser-timeout-live/local_live_timeout_demo.py
live_run_elapsed_seconds=13.51
=== relevant_events ===
ACTION browser_navigate
{
  "url": "https://work-2-gdhkbsxlnrlyosok.prod-runtime.all-hands.dev/",
  "new_tab": false,
  "kind": "BrowserNavigateAction"
}
OBSERVATION browser_navigate
{
  "content": [
    {
      "cache_prompt": false,
      "type": "text",
      "text": "Browser operation failed: Operation timed out after 2 seconds"
    }
  ],
  "is_error": true,
  "kind": "BrowserObservation"
}
ACTION terminal
{
  "command": "pwd",
  "kind": "TerminalAction"
}
OBSERVATION terminal
{
  "content": [
    {
      "cache_prompt": false,
      "type": "text",
      "text": "/workspace/project/software-agent-sdk"
    }
  ],
  "is_error": false,
  "kind": "TerminalObservation"
}
ACTION finish
{
  "message": "RECOVERED AFTER TIMEOUT",
  "kind": "FinishAction"
}
OBSERVATION finish
{
  "content": [
    {
      "cache_prompt": false,
      "type": "text",
      "text": "RECOVERED AFTER TIMEOUT"
    }
  ],
  "is_error": false,
  "kind": "FinishObservation"
}
EXAMPLE_COST: 0.01867935

Hosted OpenHands Cloud conversation (real hosted agent recovery)

Conversation: https://app.all-hands.dev/conversations/f4b7a646ad6241e1bfe90ac206019391

$ .venv/bin/python - <<'PY'
# summarized the hosted conversation via the OpenHands Cloud API
PY
conversation_link=https://app.all-hands.dev/conversations/f4b7a646ad6241e1bfe90ac206019391
execution_status=finished
browser_navigate_elapsed=16.6s
browser_get_state_elapsed=30.0s
browser_get_state_observation=
Browser operation failed:
terminal_pwd_observation=
/workspace/project/software-agent-sdk
finish_observation=
Browser observation received exactly as:
"[An error occurred during execution.]"
"Browser operation failed: "

Then I ran `pwd` successfully and got `/workspace/project/software-agent-sdk`.

RECOVERED AFTER TIMEOUT

The local SDK run above uses a real LLM plus BrowserToolSet(action_timeout_seconds=2.0) to exercise the exact timeout-to-BrowserObservation path on this branch. The hosted OpenHands Cloud conversation shows a real hosted agent also recovered from a browser hang observation and continued with pwd/finish instead of crashing the conversation.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works? (Ran the self-contained timeout demo locally before removing it from the PR because it broke check-examples.)
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name? (Not applicable: this PR no longer adds a shipped example or docs surface.)
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:1d59870-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-1d59870-python \
  ghcr.io/openhands/agent-server:1d59870-python

All tags pushed for this build

ghcr.io/openhands/agent-server:1d59870-golang-amd64
ghcr.io/openhands/agent-server:1d59870-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:1d59870-golang-arm64
ghcr.io/openhands/agent-server:1d59870-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:1d59870-java-amd64
ghcr.io/openhands/agent-server:1d59870-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:1d59870-java-arm64
ghcr.io/openhands/agent-server:1d59870-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:1d59870-python-amd64
ghcr.io/openhands/agent-server:1d59870-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:1d59870-python-arm64
ghcr.io/openhands/agent-server:1d59870-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:1d59870-golang
ghcr.io/openhands/agent-server:1d59870-java
ghcr.io/openhands/agent-server:1d59870-python

About Multi-Architecture Support

  • Each variant tag (e.g., 1d59870-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 1d59870-python-amd64) are also available if needed

Catch browser action timeouts at the browser tool boundary so a hung browser call returns a BrowserObservation error instead of bubbling up as a fatal conversation TimeoutError.

Also format empty browser exceptions more clearly to avoid blank 'Browser operation failed:' messages.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 15, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 15, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

neubig pushed a commit that referenced this pull request Mar 15, 2026
Bump openhands-agent-server from 1.14.0 to 1.15.0 so the REST API breakage workflow reflects the current breaking API surface already present on main.

This is intentionally separate from PR #2455, whose failing REST API check is unrelated to the browser timeout fix.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented Mar 16, 2026

@OpenHands test this life by creating a web service that does not respond in the allotted time and creating an example following the examples folder that asks the agent to query this web service. Run the example before and after this fix and demonstrate that it's better. Fix any failing tests as well

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 16, 2026

I'm on it! neubig can track my progress at all-hands.dev

Copy link
Copy Markdown
Contributor Author

neubig commented Mar 16, 2026

Addressed in 4010efb.

What changed:

  • kept browser action timeouts on the tool path as normal BrowserObservation errors
  • added a live slow-service regression test in tests/tools/browser_use/test_browser_executor.py
  • added examples/01_standalone_sdk/45_browser_timeout_observation.py, a self-contained example that starts a slow web service and has an agent call browser_navigate against it
  • excluded the new example from the generic example test sweep because it intentionally runs its own live timeout scenario

Validation:

  • uv run pre-commit run --files openhands-tools/openhands/tools/browser_use/impl.py tests/tools/browser_use/test_browser_executor.py tests/examples/test_examples.py examples/01_standalone_sdk/45_browser_timeout_observation.py
  • uv run pytest tests/tools/browser_use/test_browser_executor.py -q
  • uv run pytest tests/examples/test_examples.py -q
  • uv run python examples/01_standalone_sdk/45_browser_timeout_observation.py

Before/after demo using the new example:

  • legacy behavior (monkeypatched old BrowserToolExecutor.__call__): the example aborts with a fatal ConversationRunError caused by TimeoutError
  • current behavior: the agent receives Browser operation failed: Operation timed out after 2 seconds, then cleanly finishes with The slow web service timed out, but the browser tool returned a normal error observation instead of crashing the conversation.

I also checked local targeted tests after the rebase; the new regression coverage is green.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 16, 2026

Done — the new changes since the last summary are complete and pushed.

Checklist:

  • Added a live slow-service regression test in tests/tools/browser_use/test_browser_executor.py that verifies a real timeout is returned as a normal BrowserObservation error.
  • Added a standalone, self-contained example at examples/01_standalone_sdk/45_browser_timeout_observation.py.
  • Demonstrated before/after behavior with that example:
    • before/flegacy behavior: fatal ConversationRunError from TimeoutError
    • after/current behavior: tool returns Browser operation failed: Operation timed out after 2 seconds, and the conversation finishes normally
  • Updated tests/examples/test_examples.py to exclude the new example from the generic example sweep since it intentionally runs a live timeout scenario.
  • Ran targeted validation and pre-commit successfully.
  • Committed and pushed to the PR branch.

Conciseness check:

  • Changes are focused to the reviewer request.
  • No extra files beyond the requested example, test coverage, and necessary test-list update.

Validation run:

  • uv run pre-commit run --files openhands-tools/openhands/tools/browser_use/impl.py tests/tools/browser_use/test_browser_executor.py tests/examples/test_examples.py examples/01_standalone_sdk/45_browser_timeout_observation.py
  • uv run pytest tests/tools/browser_use/test_browser_executor.py -q
  • uv run pytest tests/examples/test_examples.py -q
  • uv run python examples/01_standalone_sdk/45_browser_timeout_observation.py

Pushed commit:

  • 4010efb7a04159930d0587a8505d6602a1a1fa8f

PR update comment:

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-tools/openhands/tools/browser_use
   impl.py25515738%57–61, 63–64, 66, 68, 70–73, 75–76, 78, 80, 87, 99–102, 108–109, 114, 116–118, 120–121, 129–131, 133–137, 142, 193, 198–201, 203, 225–227, 230–232, 234, 247, 284–285, 289, 299, 314–315, 320, 334–335, 340–341, 351, 369–370, 372–385, 388–401, 403–404, 410, 415–418, 426, 428, 431–432, 438–439, 444–445, 451–452, 456–457, 461–462, 466, 468–469, 471–474, 477–478, 484, 486, 488, 496–497, 501–502, 507–508, 512–513, 517–518, 523–524, 536–537, 548–549, 553–557, 571–573, 578, 583–584, 593–594
TOTAL205911039049% 

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig marked this pull request as ready for review March 19, 2026 02:43
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean, minimal fix that converts fatal timeouts into recoverable observations.

⚠️ Eval Risk Flag: This changes tool execution behavior (timeouts → observations vs fatal errors). Needs lightweight eval verification before merge to ensure agent behavior isn't negatively impacted.

Verdict: ✅ Code is solid and well-tested. Waiting on eval confirmation.

Comment thread openhands-tools/openhands/tools/browser_use/impl.py
Comment thread openhands-tools/openhands/tools/browser_use/impl.py
Comment thread tests/tools/browser_use/test_browser_executor.py
Comment thread openhands-tools/openhands/tools/browser_use/impl.py
@neubig neubig marked this pull request as draft March 19, 2026 04:37
@neubig neubig marked this pull request as ready for review March 19, 2026 11:42
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟢 Good taste - Exactly the right approach.

Key Insight: This eliminates a special case (fatal timeout) by normalizing it into the standard error observation flow. The core change is 3 lines in __call__: catch TimeoutError, return BrowserObservation. Simple, direct, and well-tested with a real slow HTTP service.

Verdict: ✅ Code is solid and ready. Waiting on eval verification (as noted in previous review) since this changes tool execution behavior.

Copy link
Copy Markdown
Contributor

@VascoSch92 VascoSch92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one suggestion for a change.
Otherwise, LGTM.

Comment thread openhands-tools/openhands/tools/browser_use/impl.py
@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented Mar 19, 2026

@OpenHands fix the remaining review comment

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 19, 2026

I'm on it! neubig can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 19, 2026

Done. New work since the last substantive update:

  • Addressed the remaining unresolved review comment in openhands-tools/openhands/tools/browser_use/impl.py
  • Kept the change minimal and behavior-preserving: _format_browser_operation_error() was only refactored for readability/simplicity
  • Ran validation for the touched area:
    • uv run pre-commit run --files openhands-tools/openhands/tools/browser_use/impl.py
    • uv run pytest tests/tools/browser_use/test_browser_executor.py -q
  • Committed and pushed the change:
    • 34db45a4refactor(tools): simplify browser error formatting
  • Replied to the GitHub review thread via GraphQL
  • Marked that review thread as resolved via GraphQL
  • Re-ran the Review Thread Gate workflow
  • Confirmed the latest Review Thread Gate run passed

Conciseness check:

  • The update is concise.
  • No extraneous files or unrelated code were changed.
  • Only the reviewed helper function was refactored, with no scope creep.

In short: the remaining review feedback has been fully addressed, the thread was resolved, and the review-thread CI gate now passes.

@neubig neubig merged commit ba70d15 into main Mar 19, 2026
30 checks passed
@neubig neubig deleted the fix/browser-timeout-observation branch March 19, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants