MCP server that gives AI assistants the power to control a web browser.
- What is this?
- Installation
- Web UI
- Web Dashboard
- Configuration
- CLI Reference
- MCP Tools
- Deep Research
- Observability
- Skills System
- REST API Reference
- Architecture
- License
This wraps browser-use as an MCP server, letting Claude (or any MCP client) automate a real browser—navigate pages, fill forms, click buttons, extract data, and more.
Browser automation tasks take 30-120+ seconds. The standard MCP stdio transport has timeout issues with long-running operations—connections drop mid-task. HTTP transport solves this by running as a persistent daemon that handles requests reliably regardless of duration.
Install as a Claude Code plugin for automatic setup:
# Install the plugin
/plugin install browser-use/mcp-browser-useThe plugin automatically:
- Installs Playwright browsers on first run
- Starts the HTTP daemon when Claude Code starts
- Registers the MCP server with Claude
Set your API key (the browser agent needs an LLM to decide actions):
# Set API key (environment variable - recommended)
export GEMINI_API_KEY=your-key-here
# Or use config file
mcp-server-browser-use config set -k llm.api_key -v your-key-hereThat's it! Claude can now use browser automation tools.
For other MCP clients or standalone use:
# Clone and install
git clone https://github.com/Saik0s/mcp-browser-use.git
cd mcp-server-browser-use
uv sync
# Install browser
uv run playwright install chromium
# Start the server
uv run mcp-server-browser-use serverAdd to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"browser-use": {
"type": "streamable-http",
"url": "http://localhost:8383/mcp"
}
}
}For MCP clients that don't support HTTP transport, use mcp-remote as a proxy:
{
"mcpServers": {
"browser-use": {
"command": "npx",
"args": ["mcp-remote", "http://localhost:8383/mcp"]
}
}
}Access the task viewer at http://localhost:8383 when the daemon is running.
Features:
- Real-time task list with status and progress
- Task details with execution logs
- Server health status and uptime
- Running tasks monitoring
The web UI provides visibility into browser automation tasks without requiring CLI commands.
Access the full-featured dashboard at http://localhost:8383/dashboard when the daemon is running.
Features:
- Tasks Tab: Complete task history with filtering, real-time status updates, and detailed execution logs
- Skills Tab: Browse, inspect, and manage learned skills with usage statistics
- History Tab: Historical view of all completed tasks with filtering by status and time
Key Capabilities:
- Run existing skills directly from the dashboard with custom parameters
- Start learning sessions to capture new skills
- Delete outdated or invalid skills
- Monitor running tasks with live progress updates
- View full task results and error details
The dashboard provides a comprehensive web interface for managing all aspects of browser automation without CLI commands.
Settings are stored in ~/.config/mcp-server-browser-use/config.json.
View current config:
mcp-server-browser-use config viewChange settings:
mcp-server-browser-use config set -k llm.provider -v openai
mcp-server-browser-use config set -k llm.model_name -v gpt-4o
# Note: Set API keys via environment variables (e.g., ANTHROPIC_API_KEY) for better security
# mcp-server-browser-use config set -k llm.api_key -v sk-...
mcp-server-browser-use config set -k browser.headless -v false
mcp-server-browser-use config set -k agent.max_steps -v 30| Key | Default | Description |
|---|---|---|
llm.provider |
google |
LLM provider (anthropic, openai, google, azure_openai, groq, deepseek, cerebras, ollama, bedrock, browser_use, openrouter, vercel) |
llm.model_name |
gemini-3-flash-preview |
Model for the browser agent |
llm.api_key |
- | API key for the provider (prefer env vars: GEMINI_API_KEY, ANTHROPIC_API_KEY, etc.) |
browser.headless |
true |
Run browser without GUI |
browser.cdp_url |
- | Connect to existing Chrome (e.g., http://localhost:9222) |
browser.user_data_dir |
- | Chrome profile directory for persistent logins/cookies |
browser.chromium_sandbox |
true |
Enable Chromium sandboxing for security |
agent.max_steps |
20 |
Max steps per browser task |
agent.use_vision |
true |
Enable vision capabilities for the agent |
research.max_searches |
5 |
Max searches per research task |
research.search_timeout |
- | Timeout for individual searches |
server.host |
127.0.0.1 |
Server bind address |
server.port |
8383 |
Server port |
server.results_dir |
- | Directory to save results |
server.auth_token |
- | Auth token for non-localhost connections |
skills.enabled |
false |
Enable skills system (beta - disabled by default) |
skills.directory |
~/.config/browser-skills |
Skills storage location |
skills.validate_results |
true |
Validate skill execution results |
Environment Variables > Config File > Defaults
Environment variables use prefix MCP_ + section + _ + key (e.g., MCP_LLM_PROVIDER).
Option 1: Persistent Profile (Recommended)
Use a dedicated Chrome profile to preserve logins and cookies:
# Set user data directory
mcp-server-browser-use config set -k browser.user_data_dir -v ~/.chrome-browser-useOption 2: Connect to Existing Chrome
Connect to an existing Chrome instance (useful for advanced debugging):
# Launch Chrome with debugging enabled
google-chrome --remote-debugging-port=9222
# Configure CDP connection (localhost only for security)
mcp-server-browser-use config set -k browser.cdp_url -v http://localhost:9222mcp-server-browser-use server # Start as background daemon
mcp-server-browser-use server -f # Start in foreground (for debugging)
mcp-server-browser-use status # Check if running
mcp-server-browser-use stop # Stop the daemon
mcp-server-browser-use logs -f # Tail server logsmcp-server-browser-use tools # List all available MCP tools
mcp-server-browser-use call run_browser_agent task="Go to google.com"
mcp-server-browser-use call run_deep_research topic="quantum computing"mcp-server-browser-use config view # Show all settings
mcp-server-browser-use config set -k <key> -v <value>
mcp-server-browser-use config path # Show config file locationmcp-server-browser-use tasks # List recent tasks
mcp-server-browser-use tasks --status running
mcp-server-browser-use task <id> # Get task details
mcp-server-browser-use task cancel <id> # Cancel a running task
mcp-server-browser-use health # Server health + statsmcp-server-browser-use call skill_list
mcp-server-browser-use call skill_get name="my-skill"
mcp-server-browser-use call skill_delete name="my-skill"Tip: Skills can also be managed through the web dashboard at http://localhost:8383/dashboard for a visual interface with one-click execution and learning sessions.
These tools are exposed via MCP for AI clients:
| Tool | Description | Typical Duration |
|---|---|---|
run_browser_agent |
Execute browser automation tasks | 60-120s |
run_deep_research |
Multi-search research with synthesis | 2-5 min |
skill_list |
List learned skills | <1s |
skill_get |
Get skill definition | <1s |
skill_delete |
Delete a skill | <1s |
health_check |
Server status and running tasks | <1s |
task_list |
Query task history | <1s |
task_get |
Get full task details | <1s |
The main tool. Tell it what you want in plain English:
mcp-server-browser-use call run_browser_agent \
task="Find the price of iPhone 16 Pro on Apple's website"The agent launches a browser, navigates to apple.com, finds the product, and returns the price.
Parameters:
| Parameter | Type | Description |
|---|---|---|
task |
string | What to do (required) |
max_steps |
int | Override default max steps |
skill_name |
string | Use a learned skill |
skill_params |
JSON | Parameters for the skill |
learn |
bool | Enable learning mode |
save_skill_as |
string | Name for the learned skill |
Multi-step web research with automatic synthesis:
mcp-server-browser-use call run_deep_research \
topic="Latest developments in quantum computing" \
max_searches=5The agent searches multiple sources, extracts key findings, and compiles a markdown report.
Deep research executes a 3-phase workflow:
┌─────────────────────────────────────────────────────────┐
│ Phase 1: PLANNING │
│ LLM generates 3-5 focused search queries from topic │
└─────────────────────────────┬───────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Phase 2: SEARCHING │
│ For each query: │
│ • Browser agent executes search │
│ • Extracts URL + summary from results │
│ • Stores findings │
└─────────────────────────────┬───────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Phase 3: SYNTHESIS │
│ LLM creates markdown report: │
│ 1. Executive Summary │
│ 2. Key Findings (by theme) │
│ 3. Analysis and Insights │
│ 4. Gaps and Limitations │
│ 5. Conclusion with Sources │
└─────────────────────────────────────────────────────────┘
Reports can be auto-saved by configuring research.save_directory.
All tool executions are tracked in SQLite for debugging and monitoring.
PENDING ──► RUNNING ──► COMPLETED
│
├──► FAILED
└──► CANCELLED
During execution, tasks progress through granular stages:
INITIALIZING → PLANNING → NAVIGATING → EXTRACTING → SYNTHESIZING
List recent tasks:
mcp-server-browser-use tasks┌──────────────┬───────────────────┬───────────┬──────────┬──────────┐
│ ID │ Tool │ Status │ Progress │ Duration │
├──────────────┼───────────────────┼───────────┼──────────┼──────────┤
│ a1b2c3d4 │ run_browser_agent │ completed │ 15/15 │ 45s │
│ e5f6g7h8 │ run_deep_research │ running │ 3/7 │ 2m 15s │
└──────────────┴───────────────────┴───────────┴──────────┴──────────┘
Get task details:
mcp-server-browser-use task a1b2c3d4Server health:
mcp-server-browser-use healthShows uptime, memory usage, and currently running tasks.
AI clients can query task status directly:
health_check- Server status + list of running taskstask_list- Recent tasks with optional status filtertask_get- Full details of a specific task
- Database:
~/.config/mcp-server-browser-use/tasks.db - Retention: Completed tasks auto-deleted after 7 days
- Format: SQLite with WAL mode for concurrency
Warning: This feature is experimental and under active development. Expect rough edges.
Skills are disabled by default. Enable them first:
mcp-server-browser-use config set -k skills.enabled -v trueSkills let you "teach" the agent a task once, then replay it 50x faster by reusing discovered API endpoints instead of full browser automation.
Browser automation is slow (60-120 seconds per task). But most websites have APIs behind their UI. If we can discover those APIs, we can call them directly.
Skills capture the API calls made during a browser session and replay them directly via CDP (Chrome DevTools Protocol).
Without Skills: Browser navigation → 60-120 seconds
With Skills: Direct API call → 1-3 seconds
mcp-server-browser-use call run_browser_agent \
task="Find React packages on npmjs.com" \
learn=true \
save_skill_as="npm-search"What happens:
- Recording: CDP captures all network traffic during execution
- Analysis: LLM identifies the "money request"—the API call that returns the data
- Extraction: URL patterns, headers, and response parsing rules are saved
- Storage: Skill saved as YAML to
~/.config/browser-skills/npm-search.yaml
mcp-server-browser-use call run_browser_agent \
skill_name="npm-search" \
skill_params='{"query": "vue"}'Every skill supports two execution paths:
If the skill captured an API endpoint (SkillRequest):
Initialize CDP session
↓
Navigate to domain (establish cookies)
↓
Execute fetch() via Runtime.evaluate
↓
Parse response with JSONPath
↓
Return data
If direct execution fails or no API was found:
Inject navigation hints into task prompt
↓
Agent uses hints as guidance
↓
Agent discovers and calls API
↓
Return data
Skills are stored as YAML in ~/.config/browser-skills/:
name: npm-search
description: Search for packages on npmjs.com
version: "1.0"
# For direct execution (fast path)
request:
url: "https://www.npmjs.com/search?q={query}"
method: GET
headers:
Accept: application/json
response_type: json
extract_path: "objects[*].package"
# For hint-based execution (fallback)
hints:
navigation:
- step: "Go to npmjs.com"
url: "https://www.npmjs.com"
money_request:
url_pattern: "/search"
method: GET
# Auth recovery (if API returns 401/403)
auth_recovery:
trigger_on_status: [401, 403]
recovery_page: "https://www.npmjs.com/login"
# Usage stats
success_count: 12
failure_count: 1
last_used: "2024-01-15T10:30:00Z"Skills support parameterized URLs and request bodies:
request:
url: "https://api.example.com/search?q={query}&limit={limit}"
body_template: '{"filters": {"category": "{category}"}}'Parameters are substituted at execution time from skill_params.
If an API returns 401/403, skills can trigger auth recovery:
auth_recovery:
trigger_on_status: [401, 403]
recovery_page: "https://example.com/login"
max_retries: 2The system will navigate to the recovery page (letting you log in) and retry.
- API Discovery: Only works if the site has an API. Sites that render everything server-side won't yield useful skills.
- Auth State: Skills rely on browser cookies. If you're logged out, they may fail.
- API Changes: If a site changes their API, the skill breaks. Falls back to hint-based execution.
- Complex Flows: Multi-step workflows (login → navigate → search) may not capture cleanly.
The server exposes REST endpoints for direct HTTP access. All endpoints return JSON unless otherwise specified.
http://localhost:8383
GET /api/health
Server health check with running task information.
curl http://localhost:8383/api/healthResponse:
{
"status": "healthy",
"uptime_seconds": 1234.5,
"memory_mb": 256.7,
"running_tasks": 2,
"tasks": [...],
"stats": {...}
}GET /api/tasks
List recent tasks with optional filtering.
# List all tasks
curl http://localhost:8383/api/tasks
# Filter by status
curl http://localhost:8383/api/tasks?status=running
# Limit results
curl http://localhost:8383/api/tasks?limit=50GET /api/tasks/{task_id}
Get full details of a specific task.
curl http://localhost:8383/api/tasks/abc123GET /api/tasks/{task_id}/logs (SSE)
Real-time task progress stream via Server-Sent Events.
const events = new EventSource('/api/tasks/abc123/logs');
events.onmessage = (e) => console.log(JSON.parse(e.data));GET /api/skills
List all available skills.
curl http://localhost:8383/api/skillsResponse:
{
"skills": [
{
"name": "npm-search",
"description": "Search for packages on npmjs.com",
"success_rate": 92.5,
"usage_count": 15,
"last_used": "2024-01-15T10:30:00Z"
}
],
"count": 1,
"skills_directory": "/Users/you/.config/browser-skills"
}GET /api/skills/{name}
Get full skill definition as JSON.
curl http://localhost:8383/api/skills/npm-searchDELETE /api/skills/{name}
Delete a skill.
curl -X DELETE http://localhost:8383/api/skills/npm-searchPOST /api/skills/{name}/run
Execute a skill with parameters (starts background task).
curl -X POST http://localhost:8383/api/skills/npm-search/run \
-H "Content-Type: application/json" \
-d '{"params": {"query": "react"}}'Response:
{
"task_id": "abc123...",
"skill_name": "npm-search",
"message": "Skill execution started",
"status_url": "/api/tasks/abc123..."
}POST /api/learn
Start a learning session to capture a new skill (starts background task).
curl -X POST http://localhost:8383/api/learn \
-H "Content-Type: application/json" \
-d '{
"task": "Search for TypeScript packages on npmjs.com",
"skill_name": "npm-search"
}'Response:
{
"task_id": "def456...",
"learning_task": "Search for TypeScript packages on npmjs.com",
"skill_name": "npm-search",
"message": "Learning session started",
"status_url": "/api/tasks/def456..."
}GET /api/events (SSE)
Server-Sent Events stream for all task updates.
const events = new EventSource('/api/events');
events.onmessage = (e) => {
const data = JSON.parse(e.data);
console.log(`Task ${data.task_id}: ${data.status}`);
};Event format:
{
"task_id": "abc123",
"full_task_id": "abc123-full-uuid...",
"tool": "run_browser_agent",
"status": "running",
"stage": "navigating",
"progress": {
"current": 5,
"total": 15,
"percent": 33.3,
"message": "Loading page..."
}
}┌─────────────────────────────────────────────────────────────────────────┐
│ MCP CLIENTS │
│ (Claude Desktop, mcp-remote, CLI call) │
└─────────────────────────────────┬───────────────────────────────────────┘
│ HTTP POST /mcp
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ FastMCP SERVER │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ MCP TOOLS │ │
│ │ • run_browser_agent • skill_list/get/delete │ │
│ │ • run_deep_research • health_check/task_list/task_get │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└────────┬──────────────┬─────────────────┬────────────────┬──────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐
│ CONFIG │ │ PROVIDERS │ │ SKILLS │ │ OBSERVABILITY │
│ Pydantic │ │ 12 LLMs │ │ Learn+Run │ │ Task Tracking │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────────────┘
│
▼
┌─────────────────────────┐
│ browser-use │
│ (Agent + Playwright) │
└─────────────────────────┘
src/mcp_server_browser_use/
├── server.py # FastMCP server + MCP tools
├── cli.py # Typer CLI for daemon management
├── config.py # Pydantic settings
├── providers.py # LLM factory (12 providers)
│
├── observability/ # Task tracking
│ ├── models.py # TaskRecord, TaskStatus, TaskStage
│ ├── store.py # SQLite persistence
│ └── logging.py # Structured logging
│
├── skills/ # Machine-learned browser skills
│ ├── models.py # Skill, SkillRequest, AuthRecovery
│ ├── store.py # YAML persistence
│ ├── recorder.py # CDP network capture
│ ├── analyzer.py # LLM skill extraction
│ ├── runner.py # Direct fetch() execution
│ └── executor.py # Hint injection
│
└── research/ # Deep research workflow
├── models.py # SearchResult, ResearchSource
└── machine.py # Plan → Search → Synthesize
| What | Where |
|---|---|
| Config | ~/.config/mcp-server-browser-use/config.json |
| Tasks DB | ~/.config/mcp-server-browser-use/tasks.db |
| Skills | ~/.config/browser-skills/*.yaml |
| Server Log | ~/.local/state/mcp-server-browser-use/server.log |
| Server PID | ~/.local/state/mcp-server-browser-use/server.json |
- OpenAI
- Anthropic
- Google Gemini
- Azure OpenAI
- Groq
- DeepSeek
- Cerebras
- Ollama (local)
- AWS Bedrock
- OpenRouter
- Vercel AI
MIT