Skip to content

Conversation

@daniel-lxs
Copy link
Member

@daniel-lxs daniel-lxs commented Jun 22, 2025

Description

Fixes #5003

This PR enables browser use functionality for Gemini models that support images, aligning with Cline's approach where browser interaction works through screenshot analysis rather than direct computer control.

Changes Made

  • **Refactored to **: Updated the entire codebase to use more accurate terminology
  • Updated browser capability logic: Changed from checking to checking in
  • Added browser support to Gemini models: All Gemini models with now have
  • Updated UI labels: Changed references from "computer use" to "browser use" in settings and model info
  • Added comprehensive tests: Created new test suite to verify browser capability logic

Testing

  • All existing tests pass
  • Added new test file
  • Verified that Gemini models with image support can now use browser tools
  • UI correctly displays browser use capability based on image support

Files Changed

  • Core type definition and all provider configurations
  • Browser capability check logic
  • UI components and translations
  • Test files updated to reflect new property names

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Tests added for new functionality
  • All tests passing
  • No breaking changes for existing functionality

Screenshots

The browser use capability is now enabled for Gemini models like that support images.

Related Issues

- Replace supportsComputerUse with supportsBrowserUse throughout codebase
- Enable browser use for any model that supports images
- Update Gemini models configuration to include supportsBrowserUse
- Update UI labels from 'computer use' to 'browser use'
- Add comprehensive tests for browser capability logic

This change aligns with Cline's approach where browser interaction
works through screenshot analysis rather than direct computer control.
@daniel-lxs daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Jun 22, 2025
chrarnoldus added a commit to Kilo-Org/kilocode that referenced this pull request Jun 29, 2025
RooCodeInc/Roo-Code#5026 does this more thoroughly,
but limits browser use to Claude and Gemini for some reason.

From my testing it additionally also works with GPT-4.1, Mistral Medium 3 and Qwen 2.5 VL
@chrarnoldus
Copy link
Contributor

Any reason to only enable browser use for Gemini (in addition to Claude) and not for all models that support images (as is done in Cline)?

I tested it with other models like GPT-4.1 and Mistral Medium 3 and it seems to work fine.

@daniel-lxs
Copy link
Member Author

@chrarnoldus
No reason, the original issue just mentioned Gemini models, This PR is just a POC we probably need to handle this differently.

@daniel-lxs daniel-lxs closed this Jul 1, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 1, 2025
@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Gemini model Browser Use

4 participants