-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Is your feature request related to a problem? Please describe.
Add a skipSnapshot (or similar) parameter to input tools (click, fill, hover, press_key, etc.) to allow disabling the automatic snapshot inclusion in responses.
Describe the solution you'd like
Problem
Currently, all input tools unconditionally call response.includeSnapshot() after completing their actions:
// src/tools/input.js - click handler (line 38)
handler: async (request, response, context) => {
// ... click logic ...
response.includeSnapshot(); // Always called, no way to disable
}This applies to all input tools:
click(line 38)hover(line 65)fill(line 139)fill_form(line 193)press_key(line 266)drag(line 163)upload_file(line 232)
There is no parameter to disable this behavior.
Impact
For complex pages with large DOM structures, the automatic snapshot causes severe context window overflow:
Real-world Example: Jupyter Notebook Testing
| Turn | Action | Context Before | Context After | Jump |
|---|---|---|---|---|
| 74 | click on notebook tab |
~31K tokens | ~242K tokens | +211K |
| 26 | click on "Run All Cells" |
~15K tokens | ~226K tokens | +211K |
The Jupyter notebook page contains hundreds of code cells with output, causing each snapshot to be ~200K+ tokens.
Result: Agent sessions crash with context overflow errors like:
[Context Usage]: 4502K / 200K tokens (2251.0%)
Session encountered an error
Log Evidence
[Tool: mcp__chrome-devtools__click] (after 3.5s thinking)
Input: {"uid":"12_122"}
[Done] (took 4.6s)
[Context] ~242K / 200K tokens (~121.1%) | Turn 74 <-- Explosion here
[Context Usage]: 4502K / 200K tokens (2251.0%)
[Session] Skipping session ID capture due to status: error
Use Cases for Disabling Snapshot
- Large pages: Jupyter notebooks, complex SPAs, data-heavy dashboards
- Rapid interactions: Multiple sequential clicks where intermediate snapshots are unnecessary
- Known state changes: When the agent already knows what will happen and doesn't need verification
- Token budget management: Preserving context window for more important operations
Describe alternatives you've considered
Workaround (Current)
Currently, the only workaround is to avoid browser-based interactions for large pages and use alternative methods (e.g., CLI commands, API calls), which defeats the purpose of browser automation.
Additional context
No response