Skip to content

Commit 622f43a

Browse files
committed
Browser Use 2.0
Summary This release upgrades the in‑chat browsing experience with persistent sessions, clearer feedback, a dedicated browser panel, and more natural action descriptions — all fully localized. What's new - Persistent Browser Sessions - The browser stays open across steps so you can send follow‑ups without relaunching. - You’ll see a "Browser Session" header and a "Session started" note when active. - Dedicated Browser Session panel - Open a full‑size view when you need more space, while keeping the chat context in view. - Live, readable action feed - Actions are presented in plain language: Launch, Click, Type, Press, Hover, Scroll. - Keyboard events now appear as "Press Enter" or "Press Esc" for easier scanning. - Broader keyboard coverage: navigation keys and common shortcuts are supported for more natural control. - Inline console logs - Console output is surfaced inline with a clear "No new logs" state. - Noise-reduced by default: only new entries since the previous step are shown to cut repeat noise. - Filter by type (Errors, Warnings, Logs) so you can focus on what matters. - Clear session controls - A prominent Disconnect/Close control makes it easy to end a session when you’re done. - Interactive in-session controls - Follow-ups attach to the active session so you can guide the assistant mid-flow without restarting. - Suggested follow-ups appear inline to keep momentum. - More accurate interactions - Improved click, scroll, and hover reliability across screen sizes with a consistent preview aspect ratio. - Seamless follow‑ups - Keep chatting while the session is open; the assistant continues from the same context. - Fully localized - New labels and action text are translated across all supported languages. What you'll notice in the UI - "Browser Session" appears in chat when a session is active. - A "Session started" status line confirms the start. - Follow-up suggestions appear inside the Browser Session row when active. - Keyboard actions are summarized clearly (e.g., "Press Tab", "Shift+Tab", "Arrow keys"). - New action wording like "Press Enter" or "Hover (x, y)". - Console Logs are visible inline, with a "No new logs" indicator and a noise‑reduced view that shows only new entries since the last step. - Type filters (All, Errors, Warnings, Logs) above the log list to quickly narrow the feed. - A quick Disconnect button to end the session.
1 parent 11f1c06 commit 622f43a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+3228
-580
lines changed

packages/types/src/message.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ export const clineSays = [
156156
"shell_integration_warning",
157157
"browser_action",
158158
"browser_action_result",
159+
"browser_session_status",
159160
"mcp_server_request_started",
160161
"mcp_server_response",
161162
"subtask_result",

src/core/assistant-message/presentAssistantMessage.ts

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -355,8 +355,32 @@ export async function presentAssistantMessage(cline: Task) {
355355
return text.replace(tagRegex, "")
356356
}
357357

358-
if (block.name !== "browser_action") {
359-
await cline.browserSession.closeBrowser()
358+
// Keep browser open during an active session so other tools can run.
359+
// Session is active if we've seen any browser_action_result and the last browser_action is not "close".
360+
try {
361+
const messages = cline.clineMessages || []
362+
const hasStarted = messages.some((m: any) => m.say === "browser_action_result")
363+
let isClosed = false
364+
for (let i = messages.length - 1; i >= 0; i--) {
365+
const m = messages[i]
366+
if (m.say === "browser_action") {
367+
try {
368+
const act = JSON.parse(m.text || "{}")
369+
isClosed = act.action === "close"
370+
} catch {}
371+
break
372+
}
373+
}
374+
const sessionActive = hasStarted && !isClosed
375+
// Only auto-close when no active browser session is present, and this isn't a browser_action
376+
if (!sessionActive && block.name !== "browser_action") {
377+
await cline.browserSession.closeBrowser()
378+
}
379+
} catch {
380+
// On any unexpected error, fall back to conservative behavior
381+
if (block.name !== "browser_action") {
382+
await cline.browserSession.closeBrowser()
383+
}
360384
}
361385

362386
if (!block.partial) {

src/core/environment/__tests__/getEnvironmentDetails.spec.ts

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,9 @@ describe("getEnvironmentDetails", () => {
116116
deref: vi.fn().mockReturnValue(mockProvider),
117117
[Symbol.toStringTag]: "WeakRef",
118118
} as unknown as WeakRef<ClineProvider>,
119+
browserSession: {
120+
isSessionActive: vi.fn().mockReturnValue(false),
121+
} as any,
119122
}
120123

121124
// Mock other dependencies.
@@ -390,4 +393,18 @@ describe("getEnvironmentDetails", () => {
390393
const result = await getEnvironmentDetails(cline as Task)
391394
expect(result).toContain("REMINDERS")
392395
})
396+
it("should include Browser Session Status when inactive", async () => {
397+
const result = await getEnvironmentDetails(mockCline as Task)
398+
expect(result).toContain("# Browser Session Status")
399+
expect(result).toContain("Inactive - Browser is not launched")
400+
})
401+
402+
it("should include Browser Session Status with current viewport when active", async () => {
403+
;(mockCline.browserSession as any).isSessionActive = vi.fn().mockReturnValue(true)
404+
;(mockCline.browserSession as any).getViewportSize = vi.fn().mockReturnValue({ width: 1280, height: 720 })
405+
406+
const result = await getEnvironmentDetails(mockCline as Task)
407+
expect(result).toContain("Active - A browser session is currently open and ready for browser_action commands")
408+
expect(result).toContain("Current viewport size: 1280x720 pixels.")
409+
})
393410
})

src/core/environment/getEnvironmentDetails.ts

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,38 @@ export async function getEnvironmentDetails(cline: Task, includeFileDetails: boo
244244
}
245245
}
246246

247+
// Add browser session status - Always show to prevent LLM from trying browser actions when no session is active
248+
const isBrowserActive = cline.browserSession.isSessionActive()
249+
250+
// Build viewport info for status (prefer actual viewport if available, else fallback to configured setting)
251+
const configuredViewport = (state?.browserViewportSize as string | undefined) ?? "900x600"
252+
let configuredWidth: number | undefined
253+
let configuredHeight: number | undefined
254+
if (configuredViewport.includes("x")) {
255+
const parts = configuredViewport.split("x").map((v) => Number(v))
256+
configuredWidth = parts[0]
257+
configuredHeight = parts[1]
258+
}
259+
260+
let actualWidth: number | undefined
261+
let actualHeight: number | undefined
262+
// Use optional chaining to avoid issues with tests that stub browserSession
263+
const vp = isBrowserActive ? (cline.browserSession as any).getViewportSize?.() : undefined
264+
if (vp) {
265+
actualWidth = vp.width
266+
actualHeight = vp.height
267+
}
268+
269+
const width = actualWidth ?? configuredWidth
270+
const height = actualHeight ?? configuredHeight
271+
const viewportInfo = isBrowserActive && width && height ? `\nCurrent viewport size: ${width}x${height} pixels.` : ""
272+
273+
details += `\n# Browser Session Status\n${
274+
isBrowserActive
275+
? "Active - A browser session is currently open and ready for browser_action commands"
276+
: "Inactive - Browser is not launched. Using any browser action except the browser_action with action='launch' to start a new session will result in an error."
277+
}${viewportInfo}\n`
278+
247279
if (includeFileDetails) {
248280
details += `\n\n# Current Workspace Directory (${cline.cwd.toPosix()}) Files\n`
249281
const isDesktop = arePathsEqual(cline.cwd, path.join(os.homedir(), "Desktop"))

src/core/prompts/__tests__/__snapshots__/system-prompt/with-computer-use-support.snap

Lines changed: 26 additions & 11 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/core/prompts/sections/rules.ts

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -92,9 +92,5 @@ ${getEditingInstructions(diffStrategy)}
9292
- At the end of each user message, you will automatically receive environment_details. This information is not written by the user themselves, but is auto-generated to provide potentially relevant context about the project structure and environment. While this information can be valuable for understanding the project context, do not treat it as a direct part of the user's request or response. Use it to inform your actions and decisions, but don't assume the user is explicitly asking about or referring to this information unless they clearly do so in their message. When using environment_details, explain your actions clearly to ensure the user understands, as they may not be aware of these details.
9393
- Before executing commands, check the "Actively Running Terminals" section in environment_details. If present, consider how these active processes might impact your task. For example, if a local development server is already running, you wouldn't need to start it again. If no active terminals are listed, proceed with command execution as normal.
9494
- MCP operations should be used one at a time, similar to other tool usage. Wait for confirmation of success before proceeding with additional operations.
95-
- It is critical you wait for the user's response after each tool use, in order to confirm the success of the tool use. For example, if asked to make a todo app, you would create a file, wait for the user's response it was created successfully, then create another file if needed, wait for the user's response it was created successfully, etc.${
96-
supportsComputerUse
97-
? " Then if you want to test your work, you might use browser_action to launch the site, wait for the user's response confirming the site was launched along with a screenshot, then perhaps e.g., click a button to test functionality if needed, wait for the user's response confirming the button was clicked along with a screenshot of the new state, before finally closing the browser."
98-
: ""
99-
}`
95+
- It is critical you wait for the user's response after each tool use, in order to confirm the success of the tool use. For example, if asked to make a todo app, you would create a file, wait for the user's response it was created successfully, then create another file if needed, wait for the user's response it was created successfully, etc.`
10096
}

0 commit comments

Comments
 (0)