Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/five-gorillas-exist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"roo-cline": patch
---

Allow selection of multiple browser viewport sizes and adjusting screenshot quality
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ A fork of Cline, an autonomous coding agent, with some additional experimental f
- Drag and drop images into chats
- "Enhance prompt" button (OpenRouter models only for now)
- Sound effects for feedback
- Option to use a larger 1280x800 browser
- Option to use browsers of different sizes and adjust screenshot quality
- Quick prompt copying from history
- OpenRouter compression support
- Includes current time in the system prompt
Expand Down
3 changes: 3 additions & 0 deletions jest.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ module.exports = {
transformIgnorePatterns: [
'node_modules/(?!(@modelcontextprotocol|delay|p-wait-for|globby|serialize-error|strip-ansi|default-shell|os-name)/)'
],
modulePathIgnorePatterns: [
'.vscode-test'
],
setupFiles: [],
globals: {
'ts-jest': {
Expand Down
4 changes: 2 additions & 2 deletions src/core/Cline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -786,8 +786,8 @@ export class Cline {
throw new Error("MCP hub not available")
}

const { browserLargeViewport, preferredLanguage } = await this.providerRef.deref()?.getState() ?? {}
const systemPrompt = await SYSTEM_PROMPT(cwd, this.api.getModel().info.supportsComputerUse ?? false, mcpHub, this.diffStrategy, browserLargeViewport) + await addCustomInstructions(this.customInstructions ?? '', cwd, preferredLanguage)
const { browserViewportSize, preferredLanguage } = await this.providerRef.deref()?.getState() ?? {}
const systemPrompt = await SYSTEM_PROMPT(cwd, this.api.getModel().info.supportsComputerUse ?? false, mcpHub, this.diffStrategy, browserViewportSize) + await addCustomInstructions(this.customInstructions ?? '', cwd, preferredLanguage)

// If the previous API request's total token usage is close to the context window, truncate the conversation history to free up space for the new request
if (previousApiReqIndex >= 0) {
Expand Down
6 changes: 3 additions & 3 deletions src/core/prompts/system.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ export const SYSTEM_PROMPT = async (
supportsComputerUse: boolean,
mcpHub: McpHub,
diffStrategy?: DiffStrategy,
browserLargeViewport?: boolean
browserViewportSize?: string
) => `You are Cline, a highly skilled software engineer with extensive knowledge in many programming languages, frameworks, design patterns, and best practices.

====
Expand Down Expand Up @@ -114,7 +114,7 @@ Usage:
Description: Request to interact with a Puppeteer-controlled browser. Every action, except \`close\`, will be responded to with a screenshot of the browser's current state, along with any new console logs. You may only perform one browser action per message, and wait for the user's response including a screenshot and logs to determine the next action.
- The sequence of actions **must always start with** launching the browser at a URL, and **must always end with** closing the browser. If you need to visit a new URL that is not possible to navigate to from the current webpage, you must first close the browser, then launch again at the new URL.
- While the browser is active, only the \`browser_action\` tool can be used. No other tools should be called during this time. You may proceed to use other tools only after closing the browser. For example if you run into an error and need to fix a file, you must close the browser, then use other tools to make the necessary changes, then re-launch the browser to verify the result.
- The browser window has a resolution of **${browserLargeViewport ? "1280x800" : "900x600"}** pixels. When performing any click actions, ensure the coordinates are within this resolution range.
- The browser window has a resolution of **${browserViewportSize || "900x600"}** pixels. When performing any click actions, ensure the coordinates are within this resolution range.
- Before clicking on any elements such as icons, links, or buttons, you must consult the provided screenshot of the page to determine the coordinates of the element. The click should be targeted at the **center of the element**, not on its edges.
Parameters:
- action: (required) The action to perform. The available actions are:
Expand All @@ -132,7 +132,7 @@ Parameters:
- Example: \`<action>close</action>\`
- url: (optional) Use this for providing the URL for the \`launch\` action.
* Example: <url>https://example.com</url>
- coordinate: (optional) The X and Y coordinates for the \`click\` action. Coordinates should be within the **${browserLargeViewport ? "1280x800" : "900x600"}** resolution.
- coordinate: (optional) The X and Y coordinates for the \`click\` action. Coordinates should be within the **${browserViewportSize || "900x600"}** resolution.
* Example: <coordinate>450,300</coordinate>
- text: (optional) Use this for providing the text for the \`type\` action.
* Example: <text>Hello, world!</text>
Expand Down
28 changes: 19 additions & 9 deletions src/core/webview/ClineProvider.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ type GlobalStateKey =
| "soundVolume"
| "diffEnabled"
| "alwaysAllowMcp"
| "browserLargeViewport"
| "browserViewportSize"
| "screenshotQuality"
| "fuzzyMatchThreshold"
| "preferredLanguage" // Language setting for Cline's communication
| "writeDelayMs"
Expand Down Expand Up @@ -624,9 +625,9 @@ export class ClineProvider implements vscode.WebviewViewProvider {
await this.updateGlobalState("diffEnabled", diffEnabled)
await this.postStateToWebview()
break
case "browserLargeViewport":
const browserLargeViewport = message.bool ?? false
await this.updateGlobalState("browserLargeViewport", browserLargeViewport)
case "browserViewportSize":
const browserViewportSize = message.text ?? "900x600"
await this.updateGlobalState("browserViewportSize", browserViewportSize)
await this.postStateToWebview()
break
case "fuzzyMatchThreshold":
Expand All @@ -641,6 +642,10 @@ export class ClineProvider implements vscode.WebviewViewProvider {
await this.updateGlobalState("writeDelayMs", message.value)
await this.postStateToWebview()
break
case "screenshotQuality":
await this.updateGlobalState("screenshotQuality", message.value)
await this.postStateToWebview()
break
case "enhancePrompt":
if (message.text) {
try {
Expand Down Expand Up @@ -1015,7 +1020,8 @@ export class ClineProvider implements vscode.WebviewViewProvider {
diffEnabled,
taskHistory,
soundVolume,
browserLargeViewport,
browserViewportSize,
screenshotQuality,
preferredLanguage,
writeDelayMs,
} = await this.getState()
Expand Down Expand Up @@ -1043,7 +1049,8 @@ export class ClineProvider implements vscode.WebviewViewProvider {
shouldShowAnnouncement: lastShownAnnouncementId !== this.latestAnnouncementId,
allowedCommands,
soundVolume: soundVolume ?? 0.5,
browserLargeViewport: browserLargeViewport ?? false,
browserViewportSize: browserViewportSize ?? "900x600",
screenshotQuality: screenshotQuality ?? 75,
preferredLanguage: preferredLanguage ?? 'English',
writeDelayMs: writeDelayMs ?? 1000,
}
Expand Down Expand Up @@ -1140,10 +1147,11 @@ export class ClineProvider implements vscode.WebviewViewProvider {
soundEnabled,
diffEnabled,
soundVolume,
browserLargeViewport,
browserViewportSize,
fuzzyMatchThreshold,
preferredLanguage,
writeDelayMs,
screenshotQuality,
] = await Promise.all([
this.getGlobalState("apiProvider") as Promise<ApiProvider | undefined>,
this.getGlobalState("apiModelId") as Promise<string | undefined>,
Expand Down Expand Up @@ -1183,10 +1191,11 @@ export class ClineProvider implements vscode.WebviewViewProvider {
this.getGlobalState("soundEnabled") as Promise<boolean | undefined>,
this.getGlobalState("diffEnabled") as Promise<boolean | undefined>,
this.getGlobalState("soundVolume") as Promise<number | undefined>,
this.getGlobalState("browserLargeViewport") as Promise<boolean | undefined>,
this.getGlobalState("browserViewportSize") as Promise<string | undefined>,
this.getGlobalState("fuzzyMatchThreshold") as Promise<number | undefined>,
this.getGlobalState("preferredLanguage") as Promise<string | undefined>,
this.getGlobalState("writeDelayMs") as Promise<number | undefined>,
this.getGlobalState("screenshotQuality") as Promise<number | undefined>,
])

let apiProvider: ApiProvider
Expand Down Expand Up @@ -1244,7 +1253,8 @@ export class ClineProvider implements vscode.WebviewViewProvider {
soundEnabled: soundEnabled ?? false,
diffEnabled: diffEnabled ?? true,
soundVolume,
browserLargeViewport: browserLargeViewport ?? false,
browserViewportSize: browserViewportSize ?? "900x600",
screenshotQuality: screenshotQuality ?? 75,
fuzzyMatchThreshold: fuzzyMatchThreshold ?? 1.0,
writeDelayMs: writeDelayMs ?? 1000,
preferredLanguage: preferredLanguage ?? (() => {
Expand Down
2 changes: 1 addition & 1 deletion src/core/webview/__tests__/ClineProvider.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ describe('ClineProvider', () => {
soundEnabled: false,
diffEnabled: false,
writeDelayMs: 1000,
browserLargeViewport: false,
browserViewportSize: "900x600",
fuzzyMatchThreshold: 1.0,
}

Expand Down
20 changes: 12 additions & 8 deletions src/services/browser/BrowserSession.ts
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,11 @@ export class BrowserSession {
"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36",
],
executablePath: stats.executablePath,
defaultViewport: await this.context.globalState.get("browserLargeViewport")
? { width: 1280, height: 800 }
: { width: 900, height: 600 },
defaultViewport: (() => {
const size = (this.context.globalState.get("browserViewportSize") as string | undefined) || "900x600"
const [width, height] = size.split("x").map(Number)
return { width, height }
})(),
// headless: false,
})
// (latest version of puppeteer does not add headless to user agent)
Expand Down Expand Up @@ -134,7 +136,7 @@ export class BrowserSession {
let screenshotBase64 = await this.page.screenshot({
...options,
type: "webp",
quality: 100, // Set maximum quality to prevent compression artifacts
quality: (await this.context.globalState.get("screenshotQuality") as number | undefined) ?? 75,
})
let screenshot = `data:image/webp;base64,${screenshotBase64}`

Expand Down Expand Up @@ -245,27 +247,29 @@ export class BrowserSession {
}

async scrollDown(): Promise<BrowserActionResult> {
const isLargeViewport = await this.context.globalState.get("browserLargeViewport")
const size = (await this.context.globalState.get("browserViewportSize") as string | undefined) || "900x600"
const height = parseInt(size.split("x")[1])
return this.doAction(async (page) => {
await page.evaluate((scrollHeight) => {
window.scrollBy({
top: scrollHeight,
behavior: "auto",
})
}, isLargeViewport ? 800 : 600)
}, height)
await delay(300)
})
}

async scrollUp(): Promise<BrowserActionResult> {
const isLargeViewport = await this.context.globalState.get("browserLargeViewport")
const size = (await this.context.globalState.get("browserViewportSize") as string | undefined) || "900x600"
const height = parseInt(size.split("x")[1])
return this.doAction(async (page) => {
await page.evaluate((scrollHeight) => {
window.scrollBy({
top: -scrollHeight,
behavior: "auto",
})
}, isLargeViewport ? 800 : 600)
}, height)
await delay(300)
})
}
Expand Down
3 changes: 2 additions & 1 deletion src/shared/ExtensionMessage.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@ export interface ExtensionState {
soundEnabled?: boolean
soundVolume?: number
diffEnabled?: boolean
browserLargeViewport?: boolean
browserViewportSize?: string
screenshotQuality?: number
fuzzyMatchThreshold?: number
preferredLanguage: string
writeDelayMs: number
Expand Down
3 changes: 2 additions & 1 deletion src/shared/WebviewMessage.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ export interface WebviewMessage {
| "soundEnabled"
| "soundVolume"
| "diffEnabled"
| "browserLargeViewport"
| "browserViewportSize"
| "screenshotQuality"
| "openMcpSettings"
| "restartMcpServer"
| "toggleToolAlwaysAllow"
Expand Down
2 changes: 1 addition & 1 deletion webview-ui/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
},
"jest": {
"transformIgnorePatterns": [
"/node_modules/(?!(rehype-highlight|react-remark|unist-util-visit|vfile|unified|bail|is-plain-obj|trough|vfile-message|unist-util-stringify-position|mdast-util-from-markdown|mdast-util-to-string|micromark|decode-named-character-reference|character-entities|markdown-table|zwitch|longest-streak|escape-string-regexp|unist-util-is|hast-util-to-text|@vscode/webview-ui-toolkit|@microsoft/fast-react-wrapper|@microsoft/fast-element|@microsoft/fast-foundation|@microsoft/fast-web-utilities|exenv-es6)/)"
"/node_modules/(?!(rehype-highlight|react-remark|unist-util-visit|unist-util-find-after|vfile|unified|bail|is-plain-obj|trough|vfile-message|unist-util-stringify-position|mdast-util-from-markdown|mdast-util-to-string|micromark|decode-named-character-reference|character-entities|markdown-table|zwitch|longest-streak|escape-string-regexp|unist-util-is|hast-util-to-text|@vscode/webview-ui-toolkit|@microsoft/fast-react-wrapper|@microsoft/fast-element|@microsoft/fast-foundation|@microsoft/fast-web-utilities|exenv-es6)/)"
],
"moduleNameMapper": {
"\\.(css|less|scss|sass)$": "identity-obj-proxy"
Expand Down
30 changes: 18 additions & 12 deletions webview-ui/src/components/chat/BrowserSessionRow.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ const BrowserSessionRow = memo((props: BrowserSessionRowProps) => {
const [maxActionHeight, setMaxActionHeight] = useState(0)
const [consoleLogsExpanded, setConsoleLogsExpanded] = useState(false)

const { browserViewportSize = "900x600" } = useExtensionState()
const [viewportWidth, viewportHeight] = browserViewportSize.split("x").map(Number)
const aspectRatio = (viewportHeight / viewportWidth * 100).toFixed(2)
const defaultMousePosition = `${Math.round(viewportWidth/2)},${Math.round(viewportHeight/2)}`

const isLastApiReqInterrupted = useMemo(() => {
// Check if last api_req_started is cancelled
const lastApiReqStarted = [...messages].reverse().find((m) => m.say === "api_req_started")
Expand Down Expand Up @@ -165,13 +170,13 @@ const BrowserSessionRow = memo((props: BrowserSessionRowProps) => {
const displayState = isLastPage
? {
url: currentPage?.currentState.url || latestState.url || initialUrl,
mousePosition: currentPage?.currentState.mousePosition || latestState.mousePosition || "700,400",
mousePosition: currentPage?.currentState.mousePosition || latestState.mousePosition || defaultMousePosition,
consoleLogs: currentPage?.currentState.consoleLogs,
screenshot: currentPage?.currentState.screenshot || latestState.screenshot,
}
: {
url: currentPage?.currentState.url || initialUrl,
mousePosition: currentPage?.currentState.mousePosition || "700,400",
mousePosition: currentPage?.currentState.mousePosition || defaultMousePosition,
consoleLogs: currentPage?.currentState.consoleLogs,
screenshot: currentPage?.currentState.screenshot,
}
Expand Down Expand Up @@ -220,10 +225,9 @@ const BrowserSessionRow = memo((props: BrowserSessionRowProps) => {
}, [isBrowsing, currentPage?.nextAction?.messages])

// Use latest click position while browsing, otherwise use display state
const { browserLargeViewport } = useExtensionState()
const mousePosition = isBrowsing ? latestClickPosition || displayState.mousePosition : displayState.mousePosition
const mousePosition = isBrowsing ? latestClickPosition || displayState.mousePosition : displayState.mousePosition || defaultMousePosition

const [browserSessionRow, { height }] = useSize(
const [browserSessionRow, { height: rowHeight }] = useSize(
<div style={{ padding: "10px 6px 10px 15px", marginBottom: -10 }}>
<div style={{ display: "flex", alignItems: "center", gap: "10px", marginBottom: "10px" }}>
{isBrowsing ? (
Expand Down Expand Up @@ -277,9 +281,10 @@ const BrowserSessionRow = memo((props: BrowserSessionRowProps) => {

{/* Screenshot Area */}
<div
data-testid="screenshot-container"
style={{
width: "100%",
paddingBottom: browserLargeViewport ? "62.5%" : "66.67%", // 800/1280 = 0.625, 600/900 = 0.667
paddingBottom: `${aspectRatio}%`, // height/width ratio
position: "relative",
backgroundColor: "var(--vscode-input-background)",
}}>
Expand Down Expand Up @@ -321,8 +326,8 @@ const BrowserSessionRow = memo((props: BrowserSessionRowProps) => {
<BrowserCursor
style={{
position: "absolute",
top: `${(parseInt(mousePosition.split(",")[1]) / (browserLargeViewport ? 800 : 600)) * 100}%`,
left: `${(parseInt(mousePosition.split(",")[0]) / (browserLargeViewport ? 1280 : 900)) * 100}%`,
top: `${(parseInt(mousePosition.split(",")[1]) / viewportHeight) * 100}%`,
left: `${(parseInt(mousePosition.split(",")[0]) / viewportWidth) * 100}%`,
transition: "top 0.3s ease-out, left 0.3s ease-out",
}}
/>
Expand Down Expand Up @@ -389,13 +394,13 @@ const BrowserSessionRow = memo((props: BrowserSessionRowProps) => {
// Height change effect
useEffect(() => {
const isInitialRender = prevHeightRef.current === 0
if (isLast && height !== 0 && height !== Infinity && height !== prevHeightRef.current) {
if (isLast && rowHeight !== 0 && rowHeight !== Infinity && rowHeight !== prevHeightRef.current) {
if (!isInitialRender) {
onHeightChange(height > prevHeightRef.current)
onHeightChange(rowHeight > prevHeightRef.current)
}
prevHeightRef.current = height
prevHeightRef.current = rowHeight
}
}, [height, isLast, onHeightChange])
}, [rowHeight, isLast, onHeightChange])

return browserSessionRow
}, deepEqual)
Expand Down Expand Up @@ -552,6 +557,7 @@ const BrowserCursor: React.FC<{ style?: React.CSSProperties }> = ({ style }) =>
...style,
}}
alt="cursor"
aria-label="cursor"
/>
)
}
Expand Down
Loading
Loading