Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions src/core/task/Task.ts
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,7 @@ export class Task extends EventEmitter<ClineEvents> {

// Computer User
browserSession: BrowserSession
private lastBrowserScreenshotMessageId?: string // Track the last message with browser screenshot

// Editing
diffViewProvider: DiffViewProvider
Expand Down Expand Up @@ -508,6 +509,57 @@ export class Task extends EventEmitter<ClineEvents> {
await this.saveApiConversationHistory()
}

/**
* Add a browser action result to conversation history, removing previous browser screenshots
* to prevent hitting provider image limits (e.g., AWS Bedrock's 20-image limit).
*/
async addBrowserActionToApiHistory(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for this critical functionality. Could we add unit tests to verify that:

  1. Previous images are properly removed from conversation history
  2. The tracking ID is correctly updated
  3. Edge cases like malformed content are handled gracefully

This is essential since this prevents API errors that halt workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding more detailed JSDoc documentation explaining the AWS Bedrock 20-image limitation, how this method differs from addToApiConversationHistory, and when this should be used vs the standard method.

toolResult: string | Array<Anthropic.TextBlockParam | Anthropic.ImageBlockParam>,
) {
// Remove previous browser screenshot from conversation history
if (this.lastBrowserScreenshotMessageId) {
// Find and remove images from the last browser action message
for (let i = this.apiConversationHistory.length - 1; i >= 0; i--) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition: If multiple browser actions happen rapidly, this search could find the wrong message or become inconsistent. Consider adding a more robust identification mechanism or ensuring browser actions are properly serialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance concern: This searches backwards through the entire conversation history on every browser action. For long conversations, this could become slow. Could we optimize this by storing a direct reference to the message instead of searching, using a more efficient lookup mechanism, or limiting the search scope?

const message = this.apiConversationHistory[i]
if (message.role === "user" && Array.isArray(message.content)) {
// Check if this message contains the last browser screenshot
const hasToolResult = message.content.some(
(block) => block.type === "text" && block.text.includes("[browser_action Result]"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting this magic string as a constant to improve maintainability and reduce the risk of typos.

)
if (hasToolResult) {
// Remove image blocks from this message, keep only text blocks
message.content = message.content.filter((block) => block.type === "text")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling: What happens if message.content is malformed or doesn't contain the expected structure? Consider adding defensive checks to prevent runtime errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding debug logging here to help troubleshoot issues. Something like console.debug would be helpful for debugging when the feature doesn't work as expected.

break
}
}
}
}

// Add the new browser action result
const content = Array.isArray(toolResult) ? toolResult : [{ type: "text" as const, text: toolResult }]
const messageWithTs = {
role: "user" as const,
content,
ts: Date.now(),
}

// Track this message if it contains images
const hasImages = Array.isArray(toolResult) && toolResult.some((block) => block.type === "image")
if (hasImages) {
this.lastBrowserScreenshotMessageId = messageWithTs.ts.toString()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be optimized? Storing the timestamp directly instead of converting to string might be more efficient for comparisons and memory usage.

}

this.apiConversationHistory.push(messageWithTs)
await this.saveApiConversationHistory()
}

/**
* Reset browser screenshot tracking when browser is closed
*/
resetBrowserScreenshotTracking() {
this.lastBrowserScreenshotMessageId = undefined
}

async overwriteApiConversationHistory(newHistory: ApiMessage[]) {
this.apiConversationHistory = newHistory
await this.saveApiConversationHistory()
Expand Down
16 changes: 12 additions & 4 deletions src/core/tools/browserActionTool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -158,17 +158,25 @@ export async function browserActionTool(
case "resize":
await cline.say("browser_action_result", JSON.stringify(browserActionResult))

pushToolResult(
formatResponse.toolResult(
{
const toolResult = formatResponse.toolResult(
`The browser action has been executed. The console logs and screenshot have been captured for your analysis.\n\nConsole logs:\n${
browserActionResult?.logs || "(No new logs)"
}\n\n(REMEMBER: if you need to proceed to using non-\`browser_action\` tools or launch a new browser, you MUST first close cline browser. For example, if after analyzing the logs and screenshot you need to edit a file, you must first close the browser before you can use the write_to_file tool.)`,
browserActionResult?.screenshot ? [browserActionResult.screenshot] : [],
),
)
)

// Use the new method to manage browser screenshot history
await cline.addBrowserActionToApiHistory(toolResult)

pushToolResult(toolResult)
}

break
case "close":
// Reset browser screenshot tracking when browser is closed
cline.resetBrowserScreenshotTracking()

pushToolResult(
formatResponse.toolResult(
`The browser has been closed. You may now proceed to using other tools.`,
Expand Down