OmniGrip is a cross-platform computer control MCP (Model Context Protocol) server that enables LLM-driven GUI automation. It provides screen capture, mouse/keyboard simulation, window management, and OCR capabilities through a standardized MCP interface.
- Display Detection - Get metadata for all connected monitors (ID, resolution, scale factor, position)
- Screenshot Capture - Full-screen or region-based screenshots with automatic scaling and JPEG compression
- Coordinate Conversion - Automatic scale ratio calculation for accurate coordinate mapping
- Mouse Control - Move, click (left/right/middle), double-click, drag operations
- Keyboard Simulation - Text typing with Unicode/CJK support, keyboard shortcuts
- Coordinate Scaling - Seamless coordinate conversion between compressed screenshots and real screen
- Window Management - List windows, get active window, bring window to foreground
- Clipboard Operations - Read/write system clipboard
- OS Detection - Get current operating system type for platform-aware automation
- Full-screen OCR - Extract all text from screen with center-point coordinates
- Text Search - Find specific text on screen with fuzzy matching
- Action Verification - Assert that expected text appears in a region (for automation validation)
OmniGrip follows Domain-Driven Design (DDD) principles:
┌────────────────────────────────────┐
│ Adapter (MCP Protocol) │ ← Protocol adaptation
├────────────────────────────────────┤
│ Application (Services) │ ← Use case orchestration
├────────────────────────────────────┤
│ Domain (Traits & Types) │ ← Core domain abstractions
├────────────────────────────────────┤
│ Infrastructure (Impls) │ ← Technical implementations
└────────────────────────────────────┘
- Rust 2024 Edition (1.82+)
- Platform-specific dependencies:
- macOS: Accessibility permissions required
- Linux: X11 or Wayland support
- Windows: No special requirements
git clone https://github.com/yourusername/OmniGrip.git
cd OmniGrip
cargo build --releaseThe binary will be at target/release/omni-grip.
For OCR features, download PP-OCRv5 model files and place them in one of these directories:
./res/chinese_model/./models/~/.omnigrip/models/
Required files:
PP-OCRv5_mobile_det_fp16.mnn(text detection model)PP-OCRv5_mobile_rec_fp16.mnn(text recognition model)ppocr_keys_v5.txt(character dictionary)
Note: OCR is optional. If model files are not found, OmniGrip will start without OCR capabilities.
# Run with stdio transport (for MCP clients)
./target/release/omni-grip
# Enable debug logging
RUST_LOG=debug ./target/release/omni-gripAdd to your MCP client configuration:
{
"mcpServers": {
"omni-grip": {
"command": "/path/to/omni-grip",
"args": []
}
}
}| Tool | Description |
|---|---|
get_displays |
Get all monitor metadata (ID, resolution, scale) |
take_screenshot |
Capture full display as JPEG with scale ratio |
take_screenshot_region |
Capture specific screen region |
| Tool | Description |
|---|---|
mouse_move |
Move cursor to absolute coordinates |
mouse_move_relative |
Move cursor by relative offset |
mouse_click |
Click at coordinates (left/right/middle, single/double) |
mouse_drag |
Drag from point A to point B |
keyboard_type |
Type text string (Unicode supported) |
keyboard_press |
Press keyboard shortcut (e.g., ["cmd", "c"]) |
| Tool | Description |
|---|---|
get_os_context |
Get OS type (windows/macos/linux) |
clipboard_read |
Read clipboard text content |
clipboard_write |
Write text to clipboard |
get_active_window |
Get focused window info |
list_windows |
List all visible windows |
focus_window |
Bring window to foreground by ID |
| Tool | Description |
|---|---|
get_ocr_data |
Run full-screen OCR, returns text with coordinates |
find_text_center |
Find text on screen, return center point |
action_assertion |
Verify expected text in screen region |
1. get_displays → Get display_id for primary monitor
2. take_screenshot(display_id=0, max_width=1000) → Get screen image + scale_ratio
3. [LLM analyzes screenshot, finds button at (500, 300)]
4. mouse_click(x=500, y=300, scale_ratio=1.5) → Click converted coordinates
5. action_assertion(region, "Success") → Verify action result
- rmcp - MCP protocol implementation
- xcap - Cross-platform screen capture
- enigo - Input simulation
- ocr-rs - PP-OCR implementation for Rust
- image - Image processing and encoding
- tokio - Async runtime
| Platform | Screen Capture | Input Simulation | OCR |
|---|---|---|---|
| macOS | ✅ | ✅ | ✅ |
| Windows | ✅ | ✅ | ✅ |
| Linux (X11) | ✅ | ✅ | ✅ |
| Linux (Wayland) | ✅ |
MIT License - see LICENSE for details.
Contributions are welcome! Please feel free to submit issues and pull requests.