Skip to content

Commit 5439426

Browse files
authored
ENG-524 Remove supportsComputerUse restriction and support browser use through any model that supports images (RooCodeInc#3048)
* Enhance fixWithCline command execution by focusing chat input and adding a delay before processing the fixWithCline command. * feat: add OpenRouter base URL and balance display component * refactor: remove supportsComputerUse from modelInfo and related components, replacing with supportsImages where applicable * feat: add OpenRouter base URL and balance display component * feat: add OpenRouter base URL and balance display component * feat: add OpenRouter base URL and balance display component * feat: add OpenRouter base URL and balance display component * feat: add OpenRouter base URL and balance display component
1 parent fffcc80 commit 5439426

File tree

7 files changed

+33
-32
lines changed

7 files changed

+33
-32
lines changed

.changeset/weak-ligers-drum.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"claude-dev": patch
3+
---
4+
5+
Remove supportsComputerUse restriction and support browser use through any model that supports images

src/core/controller/index.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1493,7 +1493,6 @@ Here is the project's README to help you get started:\n\n${mcpDetails.readmeCont
14931493
case "anthropic/claude-3.5-sonnet":
14941494
case "anthropic/claude-3.5-sonnet:beta":
14951495
// NOTE: this needs to be synced with api.ts/openrouter default model info
1496-
modelInfo.supportsComputerUse = true
14971496
modelInfo.supportsPromptCache = true
14981497
modelInfo.cacheWritesPrice = 3.75
14991498
modelInfo.cacheReadsPrice = 0.3
@@ -1576,7 +1575,6 @@ Here is the project's README to help you get started:\n\n${mcpDetails.readmeCont
15761575
maxTokens: model.max_output_tokens || undefined,
15771576
contextWindow: model.context_window,
15781577
supportsImages: model.supports_vision || undefined,
1579-
supportsComputerUse: model.supports_computer_use || undefined,
15801578
supportsPromptCache: model.supports_caching || undefined,
15811579
inputPrice: parsePrice(model.input_price),
15821580
outputPrice: parsePrice(model.output_price),

src/core/prompts/system.ts

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ import { BrowserSettings } from "@shared/BrowserSettings"
66

77
export const SYSTEM_PROMPT = async (
88
cwd: string,
9-
supportsComputerUse: boolean,
9+
supportsBrowserUse: boolean,
1010
mcpHub: McpHub,
1111
browserSettings: BrowserSettings,
1212
) => `You are Cline, a highly skilled software engineer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.
@@ -138,7 +138,7 @@ Usage:
138138
<list_code_definition_names>
139139
<path>Directory path here</path>
140140
</list_code_definition_names>${
141-
supportsComputerUse
141+
supportsBrowserUse
142142
? `
143143
144144
## browser_action
@@ -561,14 +561,14 @@ In each user message, the environment_details will specify the current mode. The
561561
CAPABILITIES
562562
563563
- You have access to tools that let you execute CLI commands on the user's computer, list files, view source code definitions, regex search${
564-
supportsComputerUse ? ", use the browser" : ""
564+
supportsBrowserUse ? ", use the browser" : ""
565565
}, read and edit files, and ask follow-up questions. These tools help you effectively accomplish a wide range of tasks, such as writing code, making edits or improvements to existing files, understanding the current state of a project, performing system operations, and much more.
566566
- When the user initially gives you a task, a recursive list of all filepaths in the current working directory ('${cwd.toPosix()}') will be included in environment_details. This provides an overview of the project's file structure, offering key insights into the project from directory/file names (how developers conceptualize and organize their code) and file extensions (the language used). This can also guide decision-making on which files to explore further. If you need to further explore directories such as outside the current working directory, you can use the list_files tool. If you pass 'true' for the recursive parameter, it will list files recursively. Otherwise, it will list files at the top level, which is better suited for generic directories where you don't necessarily need the nested structure, like the Desktop.
567567
- You can use search_files to perform regex searches across files in a specified directory, outputting context-rich results that include surrounding lines. This is particularly useful for understanding code patterns, finding specific implementations, or identifying areas that need refactoring.
568568
- You can use the list_code_definition_names tool to get an overview of source code definitions for all files at the top level of a specified directory. This can be particularly useful when you need to understand the broader context and relationships between certain parts of the code. You may need to call this tool multiple times to understand various parts of the codebase related to the task.
569569
- For example, when asked to make edits or improvements you might analyze the file structure in the initial environment_details to get an overview of the project, then use list_code_definition_names to get further insight using source code definitions for files located in relevant directories, then read_file to examine the contents of relevant files, analyze the code and suggest improvements or make necessary edits, then use the replace_in_file tool to implement changes. If you refactored code that could affect other parts of the codebase, you could use search_files to ensure you update other files as needed.
570570
- You can use the execute_command tool to run commands on the user's computer whenever you feel it can help accomplish the user's task. When you need to execute a CLI command, you must provide a clear explanation of what the command does. Prefer to execute complex CLI commands over creating executable scripts, since they are more flexible and easier to run. Interactive and long-running commands are allowed, since the commands are run in the user's VSCode terminal. The user may keep commands running in the background and you will be kept updated on their status along the way. Each command you execute is run in a new terminal instance.${
571-
supportsComputerUse
571+
supportsBrowserUse
572572
? "\n- You can use the browser_action tool to interact with websites (including html files and locally running development servers) through a Puppeteer-controlled browser when you feel it is necessary in accomplishing the user's task. This tool is particularly useful for web development tasks as it allows you to launch a browser, navigate to pages, interact with elements through clicks and keyboard input, and capture the results through screenshots and console logs. This tool may be useful at key stages of web development tasks-such as after implementing new features, making substantial changes, when troubleshooting issues, or to verify the result of your work. You can analyze the provided screenshots to ensure correct rendering or identify errors, and review console logs for runtime issues.\n - For example, if asked to add a component to a react website, you might create the necessary files, use execute_command to run the site locally, then use browser_action to launch the browser, navigate to the local server, and verify the component renders & functions correctly before closing the browser."
573573
: ""
574574
}
@@ -592,7 +592,7 @@ RULES
592592
- When executing commands, if you don't see the expected output, assume the terminal executed the command successfully and proceed with the task. The user's terminal may be unable to stream the output back properly. If you absolutely need to see the actual terminal output, use the ask_followup_question tool to request the user to copy and paste it back to you.
593593
- The user may provide a file's contents directly in their message, in which case you shouldn't use the read_file tool to get the file contents again since you already have it.
594594
- Your goal is to try to accomplish the user's task, NOT engage in a back and forth conversation.${
595-
supportsComputerUse
595+
supportsBrowserUse
596596
? `\n- The user may ask generic non-development tasks, such as "what\'s the latest news" or "look up the weather in San Diego", in which case you might use the browser_action tool to complete the task if it makes sense to do so, rather than trying to create a website or using curl to answer the question. However, if an available MCP server tool or resource can be used instead, you should prefer to use it over browser_action.`
597597
: ""
598598
}
@@ -604,7 +604,7 @@ RULES
604604
- When using the replace_in_file tool, you must include complete lines in your SEARCH blocks, not partial lines. The system requires exact line matches and cannot match partial lines. For example, if you want to match a line containing "const x = 5;", your SEARCH block must include the entire line, not just "x = 5" or other fragments.
605605
- When using the replace_in_file tool, if you use multiple SEARCH/REPLACE blocks, list them in the order they appear in the file. For example if you need to make changes to both line 10 and line 50, first include the SEARCH/REPLACE block for line 10, followed by the SEARCH/REPLACE block for line 50.
606606
- It is critical you wait for the user's response after each tool use, in order to confirm the success of the tool use. For example, if asked to make a todo app, you would create a file, wait for the user's response it was created successfully, then create another file if needed, wait for the user's response it was created successfully, etc.${
607-
supportsComputerUse
607+
supportsBrowserUse
608608
? " Then if you want to test your work, you might use browser_action to launch the site, wait for the user's response confirming the site was launched along with a screenshot, then perhaps e.g., click a button to test functionality if needed, wait for the user's response confirming the button was clicked along with a screenshot of the new state, before finally closing the browser."
609609
: ""
610610
}

src/core/task/index.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1423,11 +1423,12 @@ export class Task {
14231423
})
14241424

14251425
const disableBrowserTool = vscode.workspace.getConfiguration("cline").get<boolean>("disableBrowserTool") ?? false
1426-
const modelSupportsComputerUse = this.api.getModel().info.supportsComputerUse ?? false
1426+
// cline browser tool uses image recognition for navigation (requires model image support).
1427+
const modelSupportsBrowserUse = this.api.getModel().info.supportsImages ?? false
14271428

1428-
const supportsComputerUse = modelSupportsComputerUse && !disableBrowserTool // only enable computer use if the model supports it and the user hasn't disabled it
1429+
const supportsBrowserUse = modelSupportsBrowserUse && !disableBrowserTool // only enable browser use if the model supports it and the user hasn't disabled it
14291430

1430-
let systemPrompt = await SYSTEM_PROMPT(cwd, supportsComputerUse, this.mcpHub, this.browserSettings)
1431+
let systemPrompt = await SYSTEM_PROMPT(cwd, supportsBrowserUse, this.mcpHub, this.browserSettings)
14311432

14321433
let settingsCustomInstructions = this.customInstructions?.trim()
14331434
const preferredLanguage = getLanguageKey(

src/shared/api.ts

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,6 @@ export interface ModelInfo {
9898
maxTokens?: number
9999
contextWindow?: number
100100
supportsImages?: boolean
101-
supportsComputerUse?: boolean
102101
supportsPromptCache: boolean // this value is hardcoded for now
103102
inputPrice?: number // Keep for non-tiered input models
104103
inputPriceTiers?: PriceTier[] // Add for tiered input pricing
@@ -128,7 +127,7 @@ export const anthropicModels = {
128127
maxTokens: 8192,
129128
contextWindow: 200_000,
130129
supportsImages: true,
131-
supportsComputerUse: true,
130+
132131
supportsPromptCache: true,
133132
inputPrice: 3.0,
134133
outputPrice: 15.0,
@@ -139,7 +138,7 @@ export const anthropicModels = {
139138
maxTokens: 8192,
140139
contextWindow: 200_000,
141140
supportsImages: true,
142-
supportsComputerUse: true,
141+
143142
supportsPromptCache: true,
144143
inputPrice: 3.0, // $3 per million input tokens
145144
outputPrice: 15.0, // $15 per million output tokens
@@ -187,7 +186,7 @@ export const bedrockModels = {
187186
maxTokens: 5000,
188187
contextWindow: 300_000,
189188
supportsImages: true,
190-
supportsComputerUse: false,
189+
191190
supportsPromptCache: false,
192191
inputPrice: 0.8,
193192
outputPrice: 3.2,
@@ -196,7 +195,7 @@ export const bedrockModels = {
196195
maxTokens: 5000,
197196
contextWindow: 300_000,
198197
supportsImages: true,
199-
supportsComputerUse: false,
198+
200199
supportsPromptCache: false,
201200
inputPrice: 0.06,
202201
outputPrice: 0.24,
@@ -205,7 +204,7 @@ export const bedrockModels = {
205204
maxTokens: 5000,
206205
contextWindow: 128_000,
207206
supportsImages: false,
208-
supportsComputerUse: false,
207+
209208
supportsPromptCache: false,
210209
inputPrice: 0.035,
211210
outputPrice: 0.14,
@@ -214,7 +213,7 @@ export const bedrockModels = {
214213
maxTokens: 8192,
215214
contextWindow: 200_000,
216215
supportsImages: true,
217-
supportsComputerUse: true,
216+
218217
supportsPromptCache: true,
219218
inputPrice: 3.0,
220219
outputPrice: 15.0,
@@ -225,7 +224,7 @@ export const bedrockModels = {
225224
maxTokens: 8192,
226225
contextWindow: 200_000,
227226
supportsImages: true,
228-
supportsComputerUse: true,
227+
229228
supportsPromptCache: true,
230229
inputPrice: 3.0,
231230
outputPrice: 15.0,
@@ -291,7 +290,7 @@ export const openRouterDefaultModelInfo: ModelInfo = {
291290
maxTokens: 8192,
292291
contextWindow: 200_000,
293292
supportsImages: true,
294-
supportsComputerUse: true,
293+
295294
supportsPromptCache: true,
296295
inputPrice: 3.0,
297296
outputPrice: 15.0,
@@ -310,7 +309,7 @@ export const vertexModels = {
310309
maxTokens: 8192,
311310
contextWindow: 200_000,
312311
supportsImages: true,
313-
supportsComputerUse: true,
312+
314313
supportsPromptCache: true,
315314
inputPrice: 3.0,
316315
outputPrice: 15.0,
@@ -319,7 +318,7 @@ export const vertexModels = {
319318
maxTokens: 8192,
320319
contextWindow: 200_000,
321320
supportsImages: true,
322-
supportsComputerUse: true,
321+
323322
supportsPromptCache: true,
324323
inputPrice: 3.0,
325324
outputPrice: 15.0,
@@ -1642,7 +1641,7 @@ export const requestyDefaultModelInfo: ModelInfo = {
16421641
maxTokens: 8192,
16431642
contextWindow: 200_000,
16441643
supportsImages: true,
1645-
supportsComputerUse: false,
1644+
16461645
supportsPromptCache: true,
16471646
inputPrice: 3.0,
16481647
outputPrice: 15.0,

src/utils/cost.test.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ describe("Cost Utilities", () => {
3434
maxTokens: 8192,
3535
contextWindow: 200_000,
3636
supportsImages: true,
37-
supportsComputerUse: true,
3837
supportsPromptCache: true,
3938
inputPrice: 3.0,
4039
outputPrice: 15.0,
@@ -95,7 +94,6 @@ describe("Cost Utilities", () => {
9594
maxTokens: 8192,
9695
contextWindow: 200_000,
9796
supportsImages: true,
98-
supportsComputerUse: true,
9997
supportsPromptCache: true,
10098
inputPrice: 3.0,
10199
outputPrice: 15.0,

webview-ui/src/components/settings/ApiOptions.tsx

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1013,19 +1013,19 @@ const ApiOptions = ({
10131013
Supports Images
10141014
</VSCodeCheckbox>
10151015
<VSCodeCheckbox
1016-
checked={!!apiConfiguration?.openAiModelInfo?.supportsComputerUse}
1016+
checked={!!apiConfiguration?.openAiModelInfo?.supportsImages}
10171017
onChange={(e: any) => {
10181018
const isChecked = e.target.checked === true
10191019
let modelInfo = apiConfiguration?.openAiModelInfo
10201020
? apiConfiguration.openAiModelInfo
10211021
: { ...openAiModelInfoSaneDefaults }
1022-
modelInfo = { ...modelInfo, supportsComputerUse: isChecked }
1022+
modelInfo.supportsImages = isChecked
10231023
setApiConfiguration({
10241024
...apiConfiguration,
10251025
openAiModelInfo: modelInfo,
10261026
})
10271027
}}>
1028-
Supports Computer Use
1028+
Supports browser use
10291029
</VSCodeCheckbox>
10301030
<VSCodeCheckbox
10311031
checked={!!apiConfiguration?.openAiModelInfo?.isR1FormatRequired}
@@ -1863,10 +1863,10 @@ export const ModelInfoView = ({
18631863
doesNotSupportLabel="Does not support images"
18641864
/>,
18651865
<ModelInfoSupportsItem
1866-
key="supportsComputerUse"
1867-
isSupported={modelInfo.supportsComputerUse ?? false}
1868-
supportsLabel="Supports computer use"
1869-
doesNotSupportLabel="Does not support computer use"
1866+
key="supportsBrowserUse"
1867+
isSupported={modelInfo.supportsImages ?? false} // cline browser tool uses image recognition for navigation (requires model image support).
1868+
supportsLabel="Supports browser use"
1869+
doesNotSupportLabel="Does not support browser use"
18701870
/>,
18711871
!isGemini && (
18721872
<ModelInfoSupportsItem

0 commit comments

Comments
 (0)