Skip to content
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
16f72de
feat(mcp): migrate from SSE to FastMCP HTTP transport
leoric-crown Sep 26, 2025
05dc97c
chore(docker): remove deprecated sse-starlette from requirements
leoric-crown Sep 26, 2025
20ce5a6
fix(mcp): resolve critical issues from code review
leoric-crown Sep 26, 2025
b0fc577
fix(mcp): enhance Pydantic model handling in MCP tool wrappers
leoric-crown Sep 26, 2025
14ea300
feat(docker): enhance MCP server with improved parameter handling and…
leoric-crown Sep 26, 2025
14f30b0
fix(mcp): harden http bridge and refresh smoke tests
leoric-crown Sep 26, 2025
4b669c1
chore(docker): update server version in /health to 0.7.4
leoric-crown Sep 26, 2025
b18fe57
feat(docker): add configurable host port mapping
leoric-crown Sep 26, 2025
d9fd23c
fix(docker): update Docker configuration for environment variables an…
leoric-crown Sep 26, 2025
26cbc45
fix(mcp): address code review feedback
leoric-crown Sep 26, 2025
cb8e095
fix(mcp): address additional code review feedback
leoric-crown Sep 26, 2025
1085c89
fix: prevent file collisions in screenshot/PDF exports
leoric-crown Sep 26, 2025
ca84054
feat(docker): enhance API reliability and backward compatibility
leoric-crown Sep 27, 2025
c3f1c25
fix(mcp): improve schema endpoint and remove fallback inspection
leoric-crown Sep 27, 2025
2688a16
fix: address critical feedback issues
leoric-crown Sep 27, 2025
7142db8
fix: enhance security and documentation
leoric-crown Sep 27, 2025
7eb7d9e
fix(api): improve result normalization for single CrawlResult objects
leoric-crown Sep 27, 2025
6c7e833
fix(api): improve JSON serialization to preserve datetime and Path ob…
leoric-crown Sep 27, 2025
487ccf0
fix(api): implement recursive normalization for nested Path/datetime …
leoric-crown Sep 27, 2025
3e82fad
fix(api): properly extract results from CrawlResultContainer objects
leoric-crown Sep 27, 2025
f60e6ae
fix(api): properly unwrap container results in fallback retry path
leoric-crown Sep 27, 2025
ae348c4
refactor: implement protocol-based architecture for result normalizat…
leoric-crown Sep 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Docker Compose Configuration
# This file is used by docker-compose for variable substitution in docker-compose.yml
# Copy this file to .env and customize as needed

# ──────────────────────────────────────────────────────────────────
# Port Configuration
# ──────────────────────────────────────────────────────────────────
# Host port mapping (container always runs on 11235 internally)
HOST_PORT=11235

# ──────────────────────────────────────────────────────────────────
# Image Selection
# ──────────────────────────────────────────────────────────────────
# Use pre-built image from Docker Hub (recommended)
# IMAGE=unclecode/crawl4ai:latest
# TAG=latest

# ──────────────────────────────────────────────────────────────────
# Build Configuration (only applies when building locally)
# ──────────────────────────────────────────────────────────────────

# INSTALL_TYPE: Feature set for the installation
# - default: Basic installation (~2-3GB image)
# Includes: JsonCssExtractionStrategy, JsonXPathExtractionStrategy,
# LLMExtractionStrategy (API-based, no local ML)
# Best for: Standard web crawling, structured extraction, LLM-based extraction
#
# - all: Full installation with ML dependencies (~6-8GB image)
# Adds: PyTorch, transformers, sentence-transformers, scikit-learn, NLTK
# Enables: CosineStrategy (semantic clustering), local transformer models
# Best for: Advanced ML-based extraction, semantic content analysis
#
# - torch: PyTorch + scikit-learn + NLTK (no transformers)
# - transformer: Transformers + sentence-transformers (no PyTorch)
#
INSTALL_TYPE=default

# ENABLE_GPU: Enable NVIDIA CUDA support for GPU acceleration
# - false: CPU-only (works on all platforms)
# - true: Adds CUDA toolkit (AMD64/x86_64 only, requires NVIDIA GPU)
#
# Note: GPU support only available on AMD64 architecture
# ARM64 (Apple Silicon) will skip GPU installation
#
ENABLE_GPU=false
10 changes: 6 additions & 4 deletions crawl4ai/async_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -614,11 +614,12 @@ def dump(self) -> dict:

@staticmethod
def load(data: dict) -> "BrowserConfig":
# Deserialize the object from a dictionary
if data is None:
return BrowserConfig()
config = from_serializable_dict(data)
if isinstance(config, BrowserConfig):
return config
return BrowserConfig.from_kwargs(config)
return BrowserConfig.from_kwargs(config if config is not None else {})

class VirtualScrollConfig:
"""Configuration for virtual scroll handling.
Expand Down Expand Up @@ -1549,11 +1550,12 @@ def dump(self) -> dict:

@staticmethod
def load(data: dict) -> "CrawlerRunConfig":
# Deserialize the object from a dictionary
if data is None:
return CrawlerRunConfig()
config = from_serializable_dict(data)
if isinstance(config, CrawlerRunConfig):
return config
return CrawlerRunConfig.from_kwargs(config)
return CrawlerRunConfig.from_kwargs(config if config is not None else {})

def to_dict(self):
return {
Expand Down
42 changes: 36 additions & 6 deletions deploy/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,17 @@ EOL
unclecode/crawl4ai:0.7.0-r1
```

* **With custom host port:**
```bash
docker run -d \
-p 11235:11235 \
--name crawl4ai \
--env-file .llm.env \
--shm-size=1g \
unclecode/crawl4ai:0.7.0-r1
```
> Access at `http://localhost:11235` (mapped to container's internal port 11235)

> The server will be available at `http://localhost:11235`. Visit `/playground` to access the interactive testing interface.

#### 4. Stopping the Container
Expand Down Expand Up @@ -143,15 +154,24 @@ git clone https://github.com/unclecode/crawl4ai.git
cd crawl4ai
```

#### 2. Environment Setup (API Keys)
#### 2. Environment Setup

If you plan to use LLMs, copy the example environment file and add your API keys. This file should be in the **project root directory**.
Crawl4AI uses two environment files:

- **`.env`** - Docker Compose variables (port mapping, image tags)
- **`.llm.env`** - Container runtime variables (API keys, runtime config)

```bash
# Make sure you are in the 'crawl4ai' root directory
cp deploy/docker/.llm.env.example .llm.env

# Now edit .llm.env and add your API keys
# 1. (Optional) Copy Docker Compose config to customize host port
cp .env.example .env
# Edit .env to set HOST_PORT (default: 11235)
# The container always runs on port 11235 internally

# 2. Copy API keys config (if using LLMs)
cp deploy/docker/.llm.env.example .llm.env
# Edit .llm.env and add your API keys
```

**Flexible LLM Provider Configuration:**
Expand Down Expand Up @@ -199,12 +219,15 @@ The `docker-compose.yml` file in the project root provides a simplified approach
```bash
# Build with all features (includes torch and transformers)
INSTALL_TYPE=all docker compose up --build -d

# Build with GPU support (for AMD64 platforms)
ENABLE_GPU=true docker compose up --build -d

# Run on custom host port
HOST_PORT=8080 docker compose up -d
```

> The server will be available at `http://localhost:11235`.
> The server will be available at `http://localhost:11235` (or your custom `HOST_PORT`).

#### 4. Stopping the Service

Expand Down Expand Up @@ -286,6 +309,13 @@ The Crawl4AI server exposes two MCP endpoints:

- **Server-Sent Events (SSE)**: `http://localhost:11235/mcp/sse`
- **WebSocket**: `ws://localhost:11235/mcp/ws`
- **FastMCP HTTP**: `http://localhost:11235/mcp`

> ⚠️ **Known limitation:** The FastMCP HTTP proxy does not yet forward JWT `Authorization`
> headers. If `security.jwt_enabled=true`, MCP tool calls will fail authentication.
> Until the auth-forwarding work lands, either
> disable JWT for MCP usage or introduce an internal-only token/header that the
> proxy can inject.

### Using with Claude Code

Expand Down
9 changes: 7 additions & 2 deletions deploy/docker/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ app:
title: "Crawl4AI API"
version: "1.0.0"
host: "0.0.0.0"
port: 11234
port: 11235
reload: False
workers: 1
timeout_keep_alive: 300
Expand Down Expand Up @@ -88,4 +88,9 @@ observability:
enabled: True
endpoint: "/metrics"
health_check:
endpoint: "/health"
endpoint: "/health"

# Binary output handling (screenshots, PDFs)
binary:
default_mode: "inline" # options: inline, file
default_dir: "/tmp/crawl4ai-exports"
Loading