Skip to content

Conversation

@edenreich
Copy link
Contributor

Adds computer use capabilities including screenshot capture, mouse movement, mouse clicks, and keyboard typing. Includes a complete Docker-based example with Ubuntu GUI desktop, X11/Wayland support, and live screenshot streaming to web UI. Closes #358.

Key features:

  • Screenshot tool with streaming support
  • Mouse control (move and click)
  • Keyboard input (text and key combos)
  • Rate limiting and approval system
  • Complete Docker example with web UI integration

Technical details:

  • Supports both X11 and Wayland display servers
  • Screenshot streaming via WebSocket for live desktop viewing
  • Circular buffer for efficient screenshot storage
  • Rate limiting to prevent abuse
  • User approval system for sensitive operations
  • Docker example with headless Ubuntu desktop setup

Rename screenshot streaming UI components to use "Preview" terminology:
- Rename screenshot-overlay.js to preview-overlay.js
- Remove emoji from button (📷 Screenshots → Preview)
- Update overlay title from "Live Screenshot" to "Live Preview"
- Update user-facing messages to use "Preview" instead of "Screenshot"

This improves clarity and consistency in the web UI while keeping
internal implementation details (CSS classes, API endpoints) unchanged.

Signed-off-by: Eden Reich <[email protected]>
@edenreich
Copy link
Contributor Author

edenreich commented Jan 4, 2026

TODOs

  • Check whether it's a good idea to replace the switch from terminal to active window and back with a GUI window that is always on top - similar to how vercept AI is doing it and only show this when computer use is enabled and it's not a remote session over pty - basically only for local computer use it's necessary because the window is constantly changing focus
  • I should probably also have a visual indicator that the computer is currently watched when computer_use.screenshot.streaming is enabled

@edenreich edenreich changed the title feat: Add computer use tools for remote GUI automation feat: Add computer use tools for remote and local GUI automation Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add tools for computer use

2 participants