A Model Context Protocol (MCP) server that extracts clean, LLM-friendly content from any webpage using the Zyte API. Perfect for AI assistants that need to read and understand web content.
- 🧹 Clean Content Extraction - Get main page content
- 🌐 Browser Rendering - Full JavaScript support for dynamic pages
- ⚡ Configurable Extraction - Choose between speed (HTTP) or quality (browser rendering)
- 🔒 Secure - API keys stored in Docker secrets, runs as non-root user
- 🐳 Docker-based - Easy deployment via Docker MCP Toolkit
The server provides three tools, each using a different extraction method:
| Tool | Description |
|---|---|
fetch_page_content_from_http_response_body |
Extract from HTTP response (fastest, cheapest) |
fetch_page_content_from_browser_html |
Extract from browser with visual features (best quality, default) |
fetch_page_content_from_browser_html_only |
Extract from browser HTML only (better for JS-heavy pages) |
| Method | Speed | Cost | Use Case | Returns |
|---|---|---|---|---|
| HTTP Response Body | Fastest | Cheapest | Simple static pages, quick previews | pageContent only |
| Browser HTML | Medium | Medium | Best quality, visual features included | pageContent + browserHtml |
| Browser HTML Only | Slower | Higher | JavaScript-heavy pages, SPAs | pageContent + browserHtml |
- Docker Desktop with MCP Toolkit enabled
- Docker MCP CLI plugin (
docker mcpcommand) - Zyte API key (free tier available)
git clone https://github.com/YOUR_USERNAME/zyte-fetch-page-content-mcp-server.git
cd zyte-fetch-page-content-mcp-serverdocker build -t zyte-fetch-page-content-mcp-server .docker mcp secret set ZYTE_API_KEY="your-zyte-api-key-here"Get your API key from Zyte Dashboard.
# Create catalogs directory if it doesn't exist
mkdir -p ~/.docker/mcp/catalogs
# Copy catalog file
cp custom.yaml ~/.docker/mcp/catalogs/custom.yaml
# Import the catalog
docker mcp catalog import ~/.docker/mcp/catalogs/custom.yamldocker mcp server add zyte-fetch-page-content# Check server is listed
docker mcp server ls
# Should show:
# zyte-fetch-page-content - ✓ done - Fetches page content...
# Check tools are available
docker mcp tools ls | grep fetchEdit your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Ensure it contains:
{
"mcpServers": {
"MCP_DOCKER": {
"command": "docker",
"args": ["mcp", "gateway", "run"]
}
}
}Quit Claude Desktop completely (Cmd+Q / Ctrl+Q) and reopen it.
Once installed, you can ask Claude to use the tools:
Use fetch_page_content_from_browser_html to get content from https://example.com
Use fetch_page_content_from_http_response_body to quickly get content from https://example.com
Use fetch_page_content_from_browser_html_only to extract content from https://example.com
All three tools accept a single parameter:
url(required): The URL to fetch content from
Example:
fetch_page_content_from_browser_html(url="https://example.com")All tools return raw JSON text from the Zyte API.
{
"url": "https://example.com",
"statusCode": 200,
"pageContent": {
"itemMain": "Main article content as clean text...",
"itemSecondary": "Sidebar and navigation content..."
}
}{
"url": "https://example.com",
"statusCode": 200,
"browserHtml": "<!DOCTYPE html><html>...</html>",
"pageContent": {
"itemMain": "Main article content as clean text...",
"itemSecondary": "Sidebar and navigation content..."
}
}{
"url": "https://example.com",
"statusCode": 200,
"browserHtml": "<!DOCTYPE html><html>...</html>",
"pageContent": {
"itemMain": "Main article content as clean text...",
"itemSecondary": "Sidebar and navigation content..."
}
}# Test that the server starts
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | docker run --rm -i -e ZYTE_API_KEY="your-key" zyte-fetch-page-content-mcp-server# Test HTTP response body extraction
docker mcp tools call fetch_page_content_from_http_response_body url="https://example.com"
# Test browser HTML extraction (best quality)
docker mcp tools call fetch_page_content_from_browser_html url="https://example.com"
# Test browser HTML only extraction
docker mcp tools call fetch_page_content_from_browser_html_only url="https://example.com"# Reimport the catalog
docker mcp catalog rm custom
docker mcp catalog import ~/.docker/mcp/catalogs/custom.yaml
docker mcp server add zyte-fetch-page-contentor add this to ~/.docker/mcp/registry.yaml
registry:
zyte-fetch-page-content:
ref: ""
- Verify server is enabled:
docker mcp server ls - Check tools exist:
docker mcp tools ls | grep fetch - Restart Claude Desktop completely
# Verify secret is set
docker mcp secret ls | grep ZYTE
# Reset the secret
docker mcp secret set ZYTE_API_KEY="your-key"If you see this response:
{"error": "ZYTE_API_KEY not configured", "type": "config_error"}The API key is not set correctly. Run:
docker mcp secret set ZYTE_API_KEY="your-actual-key"The default timeout is 120 seconds. If you experience timeouts:
- Use
fetch_page_content_from_http_response_bodyfor faster results on simple pages - Consider that complex pages with heavy JavaScript may take longer to render
Claude Desktop
↓
MCP Gateway (Docker)
↓
Zyte MCP Server (this project)
↓
Zyte API (api.zyte.com)
↓
Returns JSON with pageContent
Each tool makes an HTTPS POST request to https://api.zyte.com/v1/extract with:
- Authentication: HTTP Basic Auth using your Zyte API key
- Timeout: 120 seconds
- Headers:
Content-Type: application/json,Accept-Encoding: gzip - Payload: URL and extraction options
The tools differ only in their pageContentOptions.extractFrom value and whether browserHtml: true is included.
# Install dependencies
pip install mcp[cli] httpx
# Set API key
export ZYTE_API_KEY="your-key"
# Run server
python zyte_fetch_page_content_server.py# List available tools
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | python zyte_fetch_page_content_server.pydocker build -t zyte-fetch-page-content-mcp-server .
docker mcp server rm zyte-fetch-page-content
docker mcp server add zyte-fetch-page-content
# Restart Claude Desktop- API keys are stored in Docker Desktop secrets (never in code)
- Container runs as non-root user
- No sensitive data is logged
- All network requests use HTTPS
- HTTP Basic Auth for API authentication
- For static HTML pages: Use
fetch_page_content_from_http_response_body(fastest) - For best quality extraction: Use
fetch_page_content_from_browser_html(includes visual features) - For JavaScript-heavy SPAs: Use
fetch_page_content_from_browser_html_only - Cost optimization: HTTP response body extraction is the cheapest option