Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Aug 18, 2025

Related GitHub Issue

No tracking issue provided for this PR

Description

This PR adds an interactive browser session implementation and aligns browser tool availability with model image capability.

Concise summary of changes:

  • New: BrowserSession
    • Local Chromium launch via puppeteer‑chromium‑resolver
    • Optional remote browser support (DevTools connect/disconnect) when enabled in state
    • Stable navigation (networkidle2 with timeout fallback to domcontentloaded)
    • Per‑action console log capture and PNG screenshot (base64 data URL)
    • Safe guards for test/mocked Page objects (only call setViewport, setExtraHTTPHeaders, mouse/keyboard, evaluate if present)
  • Prompt gating updates:

What did not change:

  • Existing runtime tool dispatch (browser_action via presentAssistantMessage/browserActionTool) was not altered functionally

Changes Made

Test/CI

  • Unit tests now pass on all platforms. The BrowserSession changes align with expectations in BrowserSession.spec.ts, including remote connect/disconnect behavior and guarded page method calls.
  • CI status: all required checks green (analyze, compile, unit tests on ubuntu/windows, integration tests, knip).

Manual Verification

  • Use a model with supportsImages and a mode that includes the “browser” group
  • Invoke:
    • launch: <browser_action><action>launch</action><url>https://example.com</url></browser_action>
    • follow‑up actions (one per message): click/hover (with <coordinate>), type (<text>), scroll_up/scroll_down, resize (<size>)
    • close: <action>close</action>

Additional Notes

  • No .gitignore or settings/UX changes included in this PR
  • UrlContentFetcher now respects optional launch overrides from extension globalState:
    • chromiumArgs: string[] (appended to default args; duplicates removed)
    • chromiumExecutablePath: string (overrides puppeteer's resolved executablePath)
  • Remote browser support depends on existing state keys (remoteBrowserEnabled, remoteBrowserHost); if enabled and reachable, DevTools connect is used, otherwise local Chromium is launched

Copilot AI review requested due to automatic review settings August 18, 2025 20:15
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Aug 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements interactive browser session support via a new BrowserSession service and aligns browser tool gating to use the model's image capability (supportsImages) instead of computer use support.

  • Rewrites BrowserSession class for simplified interactive browsing with local Chromium via puppeteer-chromium-resolver
  • Changes browser tool gating from supportsComputerUse to supportsImages in both preview and runtime paths
  • Removes complex remote browser connection logic in favor of streamlined local-only implementation

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/services/browser/BrowserSession.ts Complete rewrite implementing simplified interactive browser session with console logging, screenshot capture, and basic navigation/interaction actions
src/core/webview/generateSystemPrompt.ts Updates browser tool gating to check supportsImages instead of supportsComputerUse for model compatibility
src/core/task/Task.ts Aligns runtime system prompt generation to use supportsImages for browser tool availability

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 18, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR implementing interactive browser sessions! The refactoring from 560 to 280 lines is impressive and the switch from supportsComputerUse to supportsImages makes sense for screenshot-driven browsing. I've reviewed the changes and have some suggestions to improve type safety and code maintainability.

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 19, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 19, 2025
@daniel-lxs daniel-lxs marked this pull request as draft August 19, 2025 21:30
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Aug 19, 2025
@hannesrudolph hannesrudolph force-pushed the feat/interactive-browser-session-supports-images branch 2 times, most recently from 7fd56d6 to 896a2e2 Compare August 19, 2025 23:23
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 19, 2025
@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PR - Draft / In Progress size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants