An MCP server that gives AI agents their own virtual display on Windows. The agent clicks, types, and screenshots on an isolated screen while you keep working on yours.
┌──────────────────────────────────────────────────────────────┐
│ Your Physical Monitors │ Agent Virtual Screen │
│ │ │
│ You work here normally. │ Agent works here. │
│ Mouse, keyboard, apps — │ Isolated coordinates. │
│ all yours. │ Filtered shortcuts. │
│ │ Own window space. │
│ │ │
│ ← Agent can READ all screens │ ← Agent can only │
│ for context │ WRITE to this one │
└──────────────────────────────────────────────────────────────┘
When an AI agent controls your desktop directly:
- You can't work while the agent works — every mouse move derails it
- The agent can click your apps — a misplaced click hits your browser
- No safety boundary — Alt+Tab or Win+D disrupts the session
- Hard to recover — if the agent loses track, you restart from scratch
WindowsPC-MCP creates a virtual display using the Parsec Virtual Display Driver and confines the agent to it. The agent sees coordinates (0,0) to (1920,1080) on its own screen. Your monitors are untouched.
- Windows 10 or 11
- Python 3.12 or later
- Parsec VDD — auto-installed on first server run (triggers a one-time UAC prompt)
git clone https://github.com/ShikeChen01/WindowsPC-MCP.git
cd WindowsPC-MCP
pip install -e .Add to your project's .mcp.json:
{
"mcpServers": {
"windowspc-mcp": {
"command": "python",
"args": ["-m", "windowspc_mcp", "--transport", "stdio"]
}
}
}Claude Code picks this up automatically — no restart needed.
Add to your claude_desktop_config.json (Settings > Developer > Edit Config):
{
"mcpServers": {
"windowspc-mcp": {
"command": "windowspc-mcp",
"args": ["--transport", "stdio"]
}
}
}Restart Claude Desktop after saving.
WindowsPC-MCP supports two transports:
# stdio (for any MCP client that launches a subprocess)
windowspc-mcp --transport stdio
# SSE over HTTP (for network clients)
windowspc-mcp --transport sse --host localhost --port 8000Connect your client to http://localhost:8000/sse for the SSE transport.
Once connected, the agent workflow looks like this:
1. CreateScreen() → virtual display appears
2. Screenshot(screen="agent") → see what's on the agent screen
3. App(name="notepad") → launch an app (auto-moved to agent screen)
4. Snapshot() → screenshot + UI tree with labeled elements
5. Click(label=3) → click element #3 from the snapshot
6. Type(text="Hello from the agent") → type into the focused element
7. DestroyScreen() → clean up when done
The agent always calls CreateScreen first. After that, Snapshot is the primary tool for understanding what's on screen — it returns a screenshot plus a numbered list of interactive elements that Click and Type can target by label.
23 tools organized by category. See docs/tools.md for the full reference with parameters and examples.
| Tool | Description |
|---|---|
| CreateScreen | Create the agent's virtual display (1920x1080 default) |
| DestroyScreen | Remove the virtual display and release resources |
| ScreenInfo | List all monitors — agent screen is marked [AGENT] |
| RecoverWindow | Find windows by title/pid/process and move them to the agent screen |
| Tool | Description |
|---|---|
| Screenshot | Capture a screenshot (agent screen, all screens, or by index) |
| Snapshot | Screenshot + window list + interactive UI elements with labels |
| Tool | Description |
|---|---|
| Click | Click at coordinates or by element label from Snapshot |
| Type | Type text, optionally clicking a target first |
| Move | Move the cursor (with optional drag) |
| Scroll | Scroll vertically or horizontally |
| Shortcut | Send keyboard shortcuts (dangerous ones like Alt+Tab are blocked) |
| Wait | Pause execution for a given number of seconds |
| Tool | Description |
|---|---|
| MultiSelect | Click multiple positions in sequence |
| MultiEdit | Click and type into multiple fields in sequence |
| Tool | Description |
|---|---|
| App | Launch an application (windows auto-moved to agent screen) |
| PowerShell | Run a PowerShell command and return output |
| FileSystem | Read, write, list, copy, move, delete files |
| Clipboard | Get or set clipboard text |
| Process | List or kill running processes |
| Registry | Read, write, or list Windows registry values |
| Notification | Show a Windows toast notification |
| Scrape | Fetch a URL and return its text content |
| InputStatus | Check the current input mode and agent capabilities |
All tools pass through a confinement engine before executing:
- READ tools (Screenshot, Snapshot) can see all monitors for context
- WRITE tools (Click, Type, Scroll, Move) are bounds-checked to the agent screen — coordinates outside are rejected
- UNCONFINED tools (PowerShell, FileSystem, Registry) have no spatial component
- Shortcuts are filtered: global shortcuts (Alt+Tab, Win+D, Win+L) are blocked; application shortcuts (Ctrl+S, Ctrl+C) are allowed
The agent works in agent-relative coordinates — (0,0) is the top-left of its virtual display. The confinement engine translates to absolute Windows coordinates transparently.
"Parsec VDD driver not found" The driver auto-installs on first run but requires admin privileges. If the UAC prompt was dismissed, run the server once from an elevated terminal:
windowspc-mcp --transport stdioVirtual display doesn't appear
After CreateScreen, check with ScreenInfo. If the display isn't listed, the VDD driver may not be installed correctly. Reinstall from parsec-vdd releases.
"Agent screen already exists"
The previous session didn't clean up. Call DestroyScreen first, or restart the server — it auto-recovers persisted display state on startup.
App windows don't appear on the agent screen
App waits up to 5 seconds for windows to appear and moves them automatically. Some apps take longer to launch. Use RecoverWindow(process_name="appname") to move windows that appeared after the timeout.
Screenshot returns a black image Some apps render with hardware acceleration that GDI capture can't see. Try maximizing the window or using a different app. The virtual display itself always captures correctly.
Blocked shortcut error Global shortcuts (Alt+Tab, Win+D, Ctrl+Alt+Del) are intentionally blocked to prevent the agent from disrupting your desktop session. Use application-level shortcuts instead.
MIT