Web browsing tools for AI agents. Navigate and interact with web pages using semantic accessibility patterns—no screenshots needed.
Browser MCP is an MCP server that lets AI agents navigate and interact with web pages using the same accessibility semantics that screen readers use. Instead of parsing raw HTML or analyzing screenshots, agents query landmarks, headings, forms, and other semantic elements.
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@sanity-labs/browser-mcp"]
}
}
}Restart Claude Desktop. The browsing tools will appear automatically.
Add to your project's .claude/config.json or run:
claude mcp add browser "npx @sanity-labs/browser-mcp"To see what the agent is doing, run with a visible browser window:
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@sanity-labs/browser-mcp", "--no-headless"]
}
}
}The browser will open visibly so you can watch the agent navigate.
No API key is required for basic browser automation. All tools work without configuration. The API key only enables the describe tool for AI-powered page descriptions:
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@sanity-labs/browser-mcp"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}Supports OPENAI_API_KEY (gpt-4o) or ANTHROPIC_API_KEY (claude-sonnet-4). If both are set, OpenAI is preferred.
| Tool | Description |
|---|---|
open_session |
Opens a browser tab and navigates to a URL |
close_session |
Closes a browser session |
overview |
Page summary: title, URL, landmarks, element counts |
query |
Query elements by CSS selector, extract structure or text |
section |
Extract content under a heading |
elements |
List elements by type (headings, links, buttons, forms, tables, images) |
action |
Interact: navigate, click, fill, select, check, press, scroll, back, forward, highlight |
screenshot |
Capture page or element screenshots (saves to disk) |
diagnostics |
Get console logs and network requests for debugging |
run_sequence |
Execute a batch of browser operations and assertions in a single call |
describe |
Use vision AI to describe what's visible on the page (requires API key) |
The 3-call pattern covers most browsing tasks:
- Overview — Understand the page structure
- Elements/Query — Find what you need
- Action — Interact with it
// 1. What's on this page?
const overview = await mcp.call('overview', { session: 's1' });
// → 1 form, 15 links, 6 headings
// 2. What does the form look like?
const forms = await mcp.call('elements', { session: 's1', type: 'forms' });
// → fields: [{ name: 'q', label: 'Search', type: 'text' }, ...]
// 3. Fill and submit
await mcp.call('action', { session: 's1', type: 'fill', selector: '[name="q"]', value: 'accessibility' });
await mcp.call('action', { session: 's1', type: 'press', selector: '[name="q"]', value: 'Enter' });Capture full page, viewport, or specific element screenshots. Screenshots save to disk and return the file path (no base64 in context window).
// Full viewport
await mcp.call('screenshot', { session: 'main' });
// → { success: true, path: '/tmp/browser-screenshots/screenshot-123.png', size: 150000 }
// Full scrollable page
await mcp.call('screenshot', { session: 'main', fullPage: true });
// Specific element only
await mcp.call('screenshot', { session: 'main', selector: '[data-testid="tweet"]' });
// Custom save path
await mcp.call('screenshot', { session: 'main', savePath: '/tmp/my-screenshot.png' });Access browser console logs and network requests for debugging.
// Get console logs
await mcp.call('diagnostics', { session: 'main', type: 'console' });
// → { console: [{ level: 'error', text: '...', url: '...', timestamp: '...' }] }
// Get network requests
await mcp.call('diagnostics', { session: 'main', type: 'network' });
// → { network: [{ url: '...', method: 'GET', status: 200, timing: 150 }] }
// Get both
await mcp.call('diagnostics', { session: 'main', type: 'all' });
// Filter by level, limit results, clear buffer
await mcp.call('diagnostics', {
session: 'main',
type: 'console',
level: 'error',
limit: 10,
clear: true
});Scroll to an element and flash it with a colored border—useful for showing users what you're looking at.
// Highlight an element (scrolls into view + flashes orange border 3x)
await mcp.call('action', { session: 'main', type: 'highlight', selector: '.article-title' });Execute a batch of browser operations and assertions in a single call. Useful for testing flows.
await mcp.call('run_sequence', {
session: 'main',
steps: [
{ type: 'action', action: 'fill', selector: '#search', value: 'test' },
{ type: 'action', action: 'click', selector: '#submit' },
{ type: 'assert', condition: { element_exists: '#results' } },
{ type: 'assert', condition: { element_text_contains: { selector: '#results', text: 'test' } } }
]
});
// → { success: true, completed: 4, total: 4, events: [...], final_state: {...} }Use vision AI to describe what's visible on the page. Takes a screenshot and sends it to OpenAI or Anthropic for analysis, returning a text description.
Requires OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable.
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@sanity-labs/browser-mcp"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}// Describe the current viewport
await mcp.call('describe', { session: 'main' });
// → { description: "The page shows a login form with email and password fields...", provider: "openai" }
// Describe a specific element
await mcp.call('describe', { session: 'main', selector: '.error-message' });
// → { description: "A red error banner displaying 'Invalid credentials'", provider: "openai" }
// Ask a specific question
await mcp.call('describe', {
session: 'main',
prompt: 'What navigation options are visible?'
});
// → { description: "The navigation bar shows: Home, Products, About, Contact...", provider: "openai" }Also available as a query type in run_sequence:
await mcp.call('run_sequence', {
session: 'main',
steps: [
{ type: 'action', action: 'click', selector: '#submit' },
{ type: 'query', query: 'describe', params: {
selector: '.result-panel',
prompt: 'Was the form submitted successfully?'
}}
]
});npx @sanity-labs/browser-mcp [options]
Options:
--headless=true Run browser in headless mode (default)
--headless=false Run browser with visible window (for debugging)
--help Show help# Clone and install
git clone https://github.com/sanity-labs/browser-mcp.git
cd browser-mcp
npm install
# Build
npm run build
# Run tests
npm test
# Watch mode
npm run devsrc/
├── index.ts # MCP server entry point
├── cli.ts # CLI
├── session.ts # Playwright session management + diagnostics buffers
├── browser/
│ ├── accessibility.ts # DOM queries, element extraction
│ ├── actions.ts # Browser actions (including highlight)
│ └── assertions.ts # Assertion conditions for run_sequence
├── vision/
│ ├── index.ts # Vision provider selection
│ ├── openai.ts # OpenAI vision wrapper
│ └── anthropic.ts # Anthropic vision wrapper
└── tools/
├── open-session.ts # open_session tool
├── close-session.ts # close_session tool
├── overview.ts # overview tool
├── query.ts # query tool
├── section.ts # section tool
├── elements.ts # elements tool
├── action.ts # action tool
├── screenshot.ts # screenshot tool
├── diagnostics.ts # diagnostics tool
├── run-sequence.ts # run_sequence tool
└── describe.ts # describe tool (vision AI)
test/
├── fixtures/ # Test HTML pages
├── test-server.ts # Local test server
└── integration.test.ts # Integration tests
Traditional web scraping parses raw HTML—brittle and verbose. Screenshot-based approaches require vision models and can't interact precisely.
Accessibility semantics give us:
- Structure — Landmarks (nav, main, aside) reveal page organization
- Labels — Buttons, links, and inputs have accessible names
- Hierarchy — Headings create navigable outlines
- Interactivity — Forms, buttons, and controls are explicitly marked
This is how screen reader users browse—and it works for agents too.
MIT