-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: interactive browser session and supportsImages gating for browser_action #7198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: interactive browser session and supportsImages gating for browser_action #7198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements interactive browser session support via a new BrowserSession service and aligns browser tool gating to use the model's image capability (supportsImages) instead of computer use support.
- Rewrites BrowserSession class for simplified interactive browsing with local Chromium via puppeteer-chromium-resolver
- Changes browser tool gating from supportsComputerUse to supportsImages in both preview and runtime paths
- Removes complex remote browser connection logic in favor of streamlined local-only implementation
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/services/browser/BrowserSession.ts | Complete rewrite implementing simplified interactive browser session with console logging, screenshot capture, and basic navigation/interaction actions |
| src/core/webview/generateSystemPrompt.ts | Updates browser tool gating to check supportsImages instead of supportsComputerUse for model compatibility |
| src/core/task/Task.ts | Aligns runtime system prompt generation to use supportsImages for browser tool availability |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR implementing interactive browser sessions! The refactoring from 560 to 280 lines is impressive and the switch from supportsComputerUse to supportsImages makes sense for screenshot-driven browsing. I've reviewed the changes and have some suggestions to improve type safety and code maintainability.
7fd56d6 to
896a2e2
Compare
Related GitHub Issue
No tracking issue provided for this PR
Description
This PR adds an interactive browser session implementation and aligns browser tool availability with model image capability.
Concise summary of changes:
What did not change:
Changes Made
Test/CI
Manual Verification
<browser_action><action>launch</action><url>https://example.com</url></browser_action><coordinate>), type (<text>), scroll_up/scroll_down, resize (<size>)<action>close</action>Additional Notes
remoteBrowserEnabled,remoteBrowserHost); if enabled and reachable, DevTools connect is used, otherwise local Chromium is launched