Skip to content

Conversation

@filip-michalsky
Copy link

@filip-michalsky filip-michalsky commented Jan 15, 2026

Description

Adds BrowserEnv - a unified browser automation integration for the verifiers library supporting two operational modes:

DOM Mode (mode="dom")

  • Uses the Stagehand Python SDK for natural language browser control
  • Tools: navigate, observe, act, extract - Stagehand's AI-driven primitives
  • Ideal for tasks that benefit from semantic understanding of page elements

CUA Mode (mode="cua")

  • Vision-based primitives for Computer Use Agent workflows
  • Tools: click, double_click, type_text, keypress, scroll, goto, back, forward, wait, screenshot
  • Requires companion TypeScript server (included) for CDP connection via Stagehand internals
  • Automatic screenshot management with context trimming for VLM input

Both modes support local browser execution or Browserbase cloud infrastructure.

What's included:

  • verifiers/envs/integrations/browser_env/ - Core integration (BrowserEnv, DOMMode, CUAMode)
  • verifiers/envs/integrations/browser_env/cua-server/ - TypeScript server for CUA mode
  • environments/browser_dom_example/ - Minimal DOM mode example
  • environments/browser_cua_example/ - Minimal CUA mode example
  • New [browser] extra: uv add 'verifiers[browser]'

Benchmarks (GAIA, WebVoyager, Mind2Web) have been pushed to Prime Hub under the browserbase/ namespace.

Type of Change

  • New feature (non-breaking change which adds functionality)

Testing

# DOM mode
prime eval run browserbase/browser-dom-example -m openai/gpt-4.1-mini

# CUA mode (start server first: cd verifiers/envs/integrations/browser_env/cua-server && ./start.sh)
prime eval run browserbase/browser-cua-example -m qwen/qwen3-vl-30b-a3b-instruct
  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

Future work:

  • Compile CUA TypeScript server to binary to remove Node.js dependency
  • Additional benchmark environments available on Prime Hub under browserbase/ org
    ~

Note

Adds a unified browser automation integration with two modes and supporting assets.

  • New BrowserEnv in verifiers/envs/integrations/browser_env with mode="dom" (Stagehand tools) and mode="cua" (vision primitives + screenshots); default system prompts; env var validation; custom tool call handling for multipart CUA responses; screenshot filtering
  • Exports BrowserEnv via verifiers/__init__.py and integration package __init__.py (lazy imports)
  • Examples: environments/browser_dom_example and environments/browser_cua_example with minimal datasets, judge rubric, README, and pyproject.toml
  • CUA server: TypeScript Fastify service under browser_env/cua-server/ (actions API, session management, README, scripts, env templates)
  • Docs: add BrowserEnv to docs/environments.md and integrations/README.md, including install extras and mode descriptions
  • Deps: new [project.optional-dependencies].browser extra (stagehand, aiohttp, python-dotenv)
  • Tests: tests/test_browser_env.py covering env var checks, prompts, CUA formatting/filtering, DOM LLM config, example datasets; update tests/test_envs.py to skip new browser examples and mcp_env

Written by Cursor Bugbot for commit 906a836. This will update automatically on new commits. Configure here.

@CLAassistant
Copy link

CLAassistant commented Jan 15, 2026

CLA assistant check
All committers have signed the CLA.

@filip-michalsky filip-michalsky changed the title ruff precommit Add Browser Env Integration Jan 15, 2026
@filip-michalsky filip-michalsky marked this pull request as ready for review January 16, 2026 16:02
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

if self.mode == "cua":
# Filter screenshots to manage context size
messages = self._mode_impl.filter_screenshots_in_messages(list(messages))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot filtering ineffective due to wrong placement

High Severity

The filter_screenshots_in_messages call in env_response doesn't actually filter screenshots from the context sent to the model. In MultiTurnEnv.get_prompt_messages, the original unfiltered messages variable is used in the final concat_messages([messages, env_response]) call. The filtered version only affects what's passed to super().env_response() for tool call extraction, which only examines messages[-1] anyway. Screenshots will accumulate unbounded in the conversation, potentially causing context length issues despite the filtering code existing.

Fix in Cursor Fix in Web

browserbase_api_key=self.api_key,
browserbase_project_id=self.project_id,
model_api_key=api_key,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing synchronization causes resource leak in concurrent rollouts

Medium Severity

DOMMode._create_session lacks synchronization when creating the shared stagehand_client. If multiple rollouts call _create_session concurrently when stagehand_client is None, each creates a new AsyncStagehand instance. Only the last one is stored; the others are orphaned with unclosed connections. CUAMode correctly uses _thread_lock and _client_lock to protect its shared client creation, but DOMMode has no such protection.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants