-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: prevent browser tool image accumulation to avoid AWS Bedrock 20-image limit #6452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…image limit - Add addBrowserActionToApiHistory method to Task class that removes previous browser screenshots before adding new ones - Modify browserActionTool to use new method instead of direct addToApiConversationHistory - Add resetBrowserScreenshotTracking method to clear tracking when browser is closed - Fixes issue #6348 where continuous browser tool usage would accumulate screenshots and hit AWS Bedrock's 20-image limit This implements @daniel-lxs's recommendation to only send the most recent screenshot instead of accumulating all screenshots in conversation history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing this solution to prevent AWS Bedrock's 20-image limit! The approach of keeping only the most recent screenshot is sound and addresses the core issue. However, I've identified several areas that need attention before this can be merged safely.
| * Add a browser action result to conversation history, removing previous browser screenshots | ||
| * to prevent hitting provider image limits (e.g., AWS Bedrock's 20-image limit). | ||
| */ | ||
| async addBrowserActionToApiHistory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for this critical functionality. Could we add unit tests to verify that:
- Previous images are properly removed from conversation history
- The tracking ID is correctly updated
- Edge cases like malformed content are handled gracefully
This is essential since this prevents API errors that halt workflows.
| // Remove previous browser screenshot from conversation history | ||
| if (this.lastBrowserScreenshotMessageId) { | ||
| // Find and remove images from the last browser action message | ||
| for (let i = this.apiConversationHistory.length - 1; i >= 0; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential race condition: If multiple browser actions happen rapidly, this search could find the wrong message or become inconsistent. Consider adding a more robust identification mechanism or ensuring browser actions are properly serialized.
| // Remove previous browser screenshot from conversation history | ||
| if (this.lastBrowserScreenshotMessageId) { | ||
| // Find and remove images from the last browser action message | ||
| for (let i = this.apiConversationHistory.length - 1; i >= 0; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance concern: This searches backwards through the entire conversation history on every browser action. For long conversations, this could become slow. Could we optimize this by storing a direct reference to the message instead of searching, using a more efficient lookup mechanism, or limiting the search scope?
| ) | ||
| if (hasToolResult) { | ||
| // Remove image blocks from this message, keep only text blocks | ||
| message.content = message.content.filter((block) => block.type === "text") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling: What happens if message.content is malformed or doesn't contain the expected structure? Consider adding defensive checks to prevent runtime errors.
| ) | ||
| if (hasToolResult) { | ||
| // Remove image blocks from this message, keep only text blocks | ||
| message.content = message.content.filter((block) => block.type === "text") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding debug logging here to help troubleshoot issues. Something like console.debug would be helpful for debugging when the feature doesn't work as expected.
| // Track this message if it contains images | ||
| const hasImages = Array.isArray(toolResult) && toolResult.some((block) => block.type === "image") | ||
| if (hasImages) { | ||
| this.lastBrowserScreenshotMessageId = messageWithTs.ts.toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be optimized? Storing the timestamp directly instead of converting to string might be more efficient for comparisons and memory usage.
| * Add a browser action result to conversation history, removing previous browser screenshots | ||
| * to prevent hitting provider image limits (e.g., AWS Bedrock's 20-image limit). | ||
| */ | ||
| async addBrowserActionToApiHistory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding more detailed JSDoc documentation explaining the AWS Bedrock 20-image limitation, how this method differs from addToApiConversationHistory, and when this should be used vs the standard method.
| if (message.role === "user" && Array.isArray(message.content)) { | ||
| // Check if this message contains the last browser screenshot | ||
| const hasToolResult = message.content.some( | ||
| (block) => block.type === "text" && block.text.includes("[browser_action Result]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting this magic string as a constant to improve maintainability and reduce the risk of typos.
|
Closing, see #6348 (comment) |
Fixes #6348
Problem
When using the browser tool continuously in a single chat session, screenshots accumulate in the conversation history. AWS Bedrock has a 20-image limit, causing "too many images" errors after 21 screenshots, which halts the workflow and requires starting a new chat.
Solution
This PR implements @daniel-lxs's recommendation to only send the most recent screenshot instead of accumulating all screenshots in conversation history.
Changes Made:
addBrowserActionToApiHistory()method to Task class that removes previous browser screenshots before adding new onesbrowserActionToolto use the new method instead of directaddToApiConversationHistoryresetBrowserScreenshotTracking()method to clear tracking when browser is closedHow it works:
Testing:
This change prevents hitting provider image limits while maintaining the browser tool's functionality and user experience.
Important
Introduces methods to manage browser screenshots in
Task.tsand updatesbrowserActionTool.tsto prevent exceeding AWS Bedrock's 20-image limit.addBrowserActionToApiHistory()inTask.tsremoves previous screenshots before adding new ones to avoid AWS Bedrock's 20-image limit.resetBrowserScreenshotTracking()inTask.tsclears screenshot tracking when the browser is closed.browserActionTool()inbrowserActionTool.tsusesaddBrowserActionToApiHistory()to manage screenshot history.This description was created by
for 008e833. You can customize this summary. It will automatically update as commits are pushed.