Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 149 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,57 +2,98 @@

**Web search and content extraction for AI models via Model Context Protocol (MCP)**

[![Version](https://img.shields.io/badge/version-2.2.0-blue.svg)](https://github.com/Kode-Rex/webcat)
[![Version](https://img.shields.io/badge/version-2.3.1-blue.svg)](https://github.com/Kode-Rex/webcat)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Docker](https://img.shields.io/badge/docker-multi--platform-blue.svg)](https://hub.docker.com/r/tmfrisinger/webcat)

## Quick Start

### Docker (Recommended)

```bash
# Run with Docker (no setup required)
docker run -p 8000:8000 tmfrisinger/webcat:latest

# With Serper API key for premium search
docker run -p 8000:8000 -e SERPER_API_KEY=your_key tmfrisinger/webcat:latest

# With authentication enabled
docker run -p 8000:8000 -e WEBCAT_API_KEY=your_token tmfrisinger/webcat:latest
```

**Supports:** linux/amd64, linux/arm64 (Intel/AMD, Apple Silicon, AWS Graviton)

### Local Development

```bash
cd docker
python -m pip install -e ".[dev]"

# Start demo server with UI
python simple_demo.py

# Open demo client
open http://localhost:8000/demo
# Or use make commands
make dev # Start with auto-reload
make dev-demo # Start demo with auto-reload
```

![WebCat Demo Client](assets/webcat-demo-client.png)

## What is WebCat?

WebCat is an **MCP (Model Context Protocol) server** that provides AI models with:
- πŸ” **Web Search** - Serper API (premium) or DuckDuckGo (free)
- πŸ“„ **Content Extraction** - Clean markdown conversion with Trafilatura
- πŸ” **Web Search** - Serper API (premium) or DuckDuckGo (free fallback)
- πŸ“„ **Content Extraction** - Clean markdown conversion with Readability + html2text
- 🌐 **SSE Streaming** - Real-time results via Server-Sent Events
- 🎨 **Demo UI** - Interactive testing interface
- 🐳 **Multi-Platform Docker** - Works on Intel, ARM, and Apple Silicon

Built with **FastAPI** and **FastMCP** for seamless AI integration.
Built with **FastAPI**, **FastMCP**, and **Readability** for seamless AI integration.

## Features

- βœ… **Optional Authentication** - Bearer token auth when needed, or run without
- βœ… **Optional Authentication** - Bearer token auth when needed, or run without (v2.3.1)
- βœ… **Automatic Fallback** - Serper API β†’ DuckDuckGo if needed
- βœ… **Smart Content Extraction** - Trafilatura removes navigation/ads/chrome
- βœ… **MCP Compliant** - Works with Claude Desktop, LiteLLM, etc.
- βœ… **Rate Limited** - Configurable protection
- βœ… **Smart Content Extraction** - Readability + html2text removes navigation/ads/chrome
- βœ… **MCP Compliant** - Works with Claude Desktop, LiteLLM, and other MCP clients
- βœ… **Parallel Processing** - Fast concurrent scraping
- βœ… **Multi-Platform Docker** - Linux (amd64/arm64) support

## Installation & Usage

### Docker Deployment

```bash
# Quick start - no configuration needed
docker run -p 8000:8000 tmfrisinger/webcat:latest

# With environment variables
docker run -p 8000:8000 \
-e SERPER_API_KEY=your_key \
-e WEBCAT_API_KEY=your_token \
tmfrisinger/webcat:latest

# Using docker-compose
cd docker
docker-compose up
```

### Local Development

```bash
cd docker
python -m pip install -e ".[dev]"

# Configure environment (optional)
echo "SERPER_API_KEY=your_key" > .env

# Start MCP server
python mcp_server.py
# Development mode with auto-reload
make dev # Start MCP server with auto-reload
make dev-demo # Start demo server with auto-reload

# Or start demo server with UI
python simple_demo.py
# Production mode
make mcp # Start MCP server
make demo # Start demo server
```

## Available Endpoints
Expand All @@ -71,13 +112,11 @@ python simple_demo.py

| Variable | Default | Description |
|----------|---------|-------------|
| `SERPER_API_KEY` | *(none)* | Serper API key for premium search (optional) |
| `SERPER_API_KEY` | *(none)* | Serper API key for premium search (optional, falls back to DuckDuckGo if not set) |
| `WEBCAT_API_KEY` | *(none)* | Bearer token for authentication (optional, if set all requests must include `Authorization: Bearer <token>`) |
| `PORT` | `8000` | Server port |
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
| `LOG_DIR` | `/tmp` | Log file directory |
| `RATE_LIMIT_WINDOW` | `60` | Rate limit window in seconds |
| `RATE_LIMIT_MAX_REQUESTS` | `10` | Max requests per window |

### Get a Serper API Key

Expand Down Expand Up @@ -114,65 +153,100 @@ MCP Client (Claude, LiteLLM)
↓
FastMCP Server (SSE Transport)
↓
Authentication (optional bearer token)
↓
Search Decision
β”œβ”€ Serper API (premium) β†’ Content Scraper
└─ DuckDuckGo (free) β†’ Content Scraper
↓
Trafilatura (markdown)
Readability + html2text
↓
Structured Response
Markdown Response
```

**Tech Stack:**
- **FastAPI** - High-performance async web framework
- **FastMCP** - MCP protocol implementation with SSE transport
- **Readability** - Content extraction (removes navigation/ads)
- **html2text** - HTML to markdown conversion
- **Serper/DuckDuckGo** - Search APIs with automatic fallback

## Testing

```bash
cd docker

# Run all tests
# Run all unit tests
make test
# OR
python -m pytest tests/unit -v

# With coverage
# With coverage report
make test-coverage
# OR
python -m pytest tests/unit --cov=. --cov-report=term --cov-report=html

# CI-safe (no external dependencies)
# CI-safe tests (no external dependencies)
python -m pytest -v -m "not integration"

# Run specific test file
python -m pytest tests/unit/services/test_content_scraper.py -v
```

**Current test coverage:** 70%+ across all modules
**Current test coverage:** 70%+ across all modules (enforced in CI)

## Development

```bash
# Install with dev dependencies
pip install -e ".[dev]"

# Format code
make format

# Lint code
make lint

# Run tests
make test

# Full CI check
make ci
# First-time setup
make setup-dev # Install all dependencies + pre-commit hooks

# Development workflow
make dev # Start server with auto-reload
make format # Auto-format code (Black + isort)
make lint # Check code quality (flake8)
make test # Run unit tests

# Before committing
make ci-fast # Quick validation (~30 seconds)
# OR
make ci # Full validation with security checks (~2-3 minutes)

# Code quality tools
make format-check # Check formatting without changes
make security # Run bandit security scanner
make audit # Check dependency vulnerabilities
```

**Pre-commit Hooks:**
Hooks run automatically on `git commit` to ensure code quality. Install with `make setup-dev`.

## Project Structure

```
docker/
β”œβ”€β”€ mcp_server.py # Main MCP server
β”œβ”€β”€ simple_demo.py # Demo server with UI
β”œβ”€β”€ clients/ # Serper & DuckDuckGo clients
β”œβ”€β”€ services/ # Content scraping & search
β”œβ”€β”€ mcp_server.py # Main MCP server (FastMCP)
β”œβ”€β”€ simple_demo.py # Demo server with interactive UI
β”œβ”€β”€ health.py # Health check endpoint
β”œβ”€β”€ api_tools.py # API tooling utilities
β”œβ”€β”€ clients/ # External API clients
β”‚ β”œβ”€β”€ serper_client.py # Serper API integration
β”‚ └── duckduckgo_client.py # DuckDuckGo fallback
β”œβ”€β”€ services/ # Core business logic
β”‚ β”œβ”€β”€ search_service.py # Search orchestration
β”‚ └── content_scraper.py # Readability + html2text
β”œβ”€β”€ tools/ # MCP tool implementations
β”‚ └── search_tool.py # Search tool with auth
β”œβ”€β”€ models/ # Pydantic data models
β”‚ β”œβ”€β”€ domain/ # Domain entities
β”‚ └── responses/ # API responses
β”œβ”€β”€ endpoints/ # FastAPI endpoints
└── tests/ # Comprehensive test suite
β”‚ β”œβ”€β”€ domain/ # Domain entities (SearchResult, etc.)
β”‚ └── responses/ # API response models
β”œβ”€β”€ utils/ # Shared utilities
β”‚ └── auth.py # Bearer token authentication
β”œβ”€β”€ endpoints/ # FastAPI endpoints
β”œβ”€β”€ tests/ # Comprehensive test suite
β”‚ β”œβ”€β”€ unit/ # Unit tests (mocked dependencies)
β”‚ └── integration/ # Integration tests (external deps)
└── pyproject.toml # Project config + dependencies
```

## Search Quality Comparison
Expand All @@ -185,12 +259,39 @@ docker/
| **Speed** | Fast | Fast |
| **Rate Limits** | 2,500/month (free tier) | None |

## Docker Multi-Platform Support

WebCat supports multiple architectures for broad deployment compatibility:

```bash
# Build locally for multiple platforms
cd docker
./build.sh # Builds for linux/amd64 and linux/arm64

# Manual multi-platform build and push
docker buildx build --platform linux/amd64,linux/arm64 \
-t tmfrisinger/webcat:2.3.1 \
-t tmfrisinger/webcat:latest \
-f Dockerfile --push .

# Verify multi-platform support
docker buildx imagetools inspect tmfrisinger/webcat:latest
```

**Automated Releases:**
Push a version tag to trigger automated multi-platform builds via GitHub Actions:
```bash
git tag v2.3.1
git push origin v2.3.1
```

## Limitations

- **Text-focused:** Optimized for article content, not multimedia
- **Rate limits:** Respects configured limits to prevent abuse
- **No JavaScript:** Cannot scrape dynamic JS-rendered content
- **No JavaScript:** Cannot scrape dynamic JS-rendered content (uses static HTML)
- **PDF support:** Detection only, not full extraction
- **Python 3.11 required:** Not compatible with 3.10 or 3.12
- **External API limits:** Subject to Serper API rate limits (2,500/month free tier)

## Contributing

Expand All @@ -216,4 +317,4 @@ MIT License - see [LICENSE](LICENSE) file for details.

---

**Version 2.2.0** | Built with ❀️ using FastMCP, FastAPI, and Trafilatura
**Version 2.3.1** | Built with FastMCP, FastAPI, Readability, and html2text
Loading