Skip to content

Conversation

@edenreich
Copy link
Contributor

Adds computer use capabilities including screenshot capture, mouse movement, mouse clicks, and keyboard typing. Includes a complete Docker-based example with Ubuntu GUI desktop, X11/Wayland support, and live screenshot streaming to web UI. Closes #358.

Key features:

  • Screenshot tool with streaming support
  • Mouse control (move and click)
  • Keyboard input (text and key combos)
  • Rate limiting and approval system
  • Complete Docker example with web UI integration

Technical details:

  • Supports both X11 and Wayland display servers
  • Screenshot streaming via WebSocket for live desktop viewing
  • Circular buffer for efficient screenshot storage
  • Rate limiting to prevent abuse
  • User approval system for sensitive operations
  • Docker example with headless Ubuntu desktop setup

Rename screenshot streaming UI components to use "Preview" terminology:
- Rename screenshot-overlay.js to preview-overlay.js
- Remove emoji from button (📷 Screenshots → Preview)
- Update overlay title from "Live Screenshot" to "Live Preview"
- Update user-facing messages to use "Preview" instead of "Screenshot"

This improves clarity and consistency in the web UI while keeping
internal implementation details (CSS classes, API endpoints) unchanged.

Signed-off-by: Eden Reich <[email protected]>
@edenreich
Copy link
Contributor Author

edenreich commented Jan 4, 2026

TODOs

  • Check whether it's a good idea to replace the switch from terminal to active window and back with a GUI window that is always on top - similar to how vercept AI is doing it and only show this when computer use is enabled and it's not a remote session over pty - basically only for local computer use it's necessary because the window is constantly changing focus
  • I should probably also have a visual indicator that the computer is currently watched when computer_use.screenshot.streaming is enabled

@edenreich edenreich changed the title feat: Add computer use tools for remote GUI automation feat: Add computer use tools for remote and local GUI automation Jan 4, 2026
**Thread Safety & Race Conditions:**
- Add thread-safe WindowCoordinator with serial DispatchQueue
- Protect window arrays (borders, click indicators, move trails) from concurrent access
- Fix segmentation faults during computer use tool execution
- Add process safety checks in writeEvent to prevent writes to dead processes

**Coordinate System Fixes:**
- Fix double coordinate conversion in click/move indicators
- Remove redundant Y-axis flip in Swift (Go already converts to macOS coords)
- Click indicators and move trails now appear at correct screen positions

**Control Event Architecture:**
- Add control event forwarder for GUI → TUI communication
- Implement dedicated always-open channel for pause/resume events
- Revert EventBridge.Tap() to simple unidirectional design
- Add GetEventBridge() to StateManager interface

**Code Organization:**
- Extract view classes into separate files:
  - ClickIndicator.swift: Circular ring indicator
  - MoveTrail.swift: Arrow showing mouse movement
  - ControlBar.swift: Pause/resume button bar
  - ImageThumbnail.swift: Full-screen image viewer
- Refactor monitorProcess into respawnWindow and restoreBorderOverlay
- Update build.sh to include all view files

**Manual Tool Execution Fixes:**
- Fix completion event status: "complete" → "completed"
- Add image attachments to completion events
- GetLatestScreenshot now properly shows completion and displays images
- Manual tools (!! syntax) now broadcast events to floating window

**Other Improvements:**
- Remove dead code in screenshot_server.go
- Simplify chat handler resume logic
- Clean up verbose comments in manager.go

Fixes segfaults during MouseClick/MouseMove operations and ensures
all visual indicators work correctly with proper event completion.

Signed-off-by: Eden Reich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add tools for computer use

2 participants