API Server

Kreuzberg provides two server modes for programmatic access: an HTTP REST API server for general integration and a Model Context Protocol (MCP) server for AI agent integration.

Server Types

HTTP REST API Server

A production-ready HTTP API server providing RESTful endpoints for document extraction, health checks, and cache management.

Best for:

Web applications
Microservices integration
General HTTP clients
Load-balanced deployments

MCP Server

A Model Context Protocol server that exposes Kreuzberg as tools for AI agents and assistants.

Best for:

AI agent integration (Claude, GPT, etc.)
Agentic workflows
Tool use by language models
Stdio-based communication

HTTP REST API

Starting the Server

=== "CLI"

--8<-- "snippets/api_server/cli.md"

=== "C#"

--8<-- "snippets/api_server/csharp.md"

=== "Docker"

--8<-- "snippets/api_server/docker.md"

=== "Go"

--8<-- "snippets/api_server/go.md"

=== "Java"

--8<-- "snippets/api_server/java.md"

=== "Python"

--8<-- "snippets/api_server/python.md"

=== "Rust"

--8<-- "snippets/api_server/rust.md"

API Endpoints

POST /extract

Extract text from uploaded files via multipart form data.

Request Format:

Method: POST
Content-Type: multipart/form-data
Fields:
- files (required, repeatable): Files to extract
- config (optional): JSON configuration overrides

Response: JSON array of extraction results

Example:

# Extract a single file via HTTP POST
curl -F "files=@document.pdf" http://localhost:8000/extract

# Extract multiple files in a single request
curl -F "files=@doc1.pdf" -F "files=@doc2.docx" \
  http://localhost:8000/extract

# Extract with custom OCR configuration override
curl -F "files=@scanned.pdf" \
     -F 'config={"ocr":{"language":"eng"},"force_ocr":true}' \
  http://localhost:8000/extract

Response Schema:

[
  {
    "content": "Extracted text content...",
    "mime_type": "application/pdf",
    "metadata": {
      "page_count": 10,
      "author": "John Doe"
    },
    "tables": [],
    "detected_languages": ["eng"],
    "chunks": null,
    "images": null
  }
]

POST /embed

Generate embeddings for text strings without document extraction.

Request Format:

Method: POST
Content-Type: application/json
Body:
- texts (required): Array of strings to generate embeddings for
- config (optional): Embedding configuration overrides

Response: JSON object containing embeddings, model info, dimensions, and count

Example:

# Generate embeddings for two text strings
curl -X POST http://localhost:8000/embed \
  -H "Content-Type: application/json" \
  -d '{"texts":["Hello world","Second text"]}'

# Generate embeddings with custom model configuration
curl -X POST http://localhost:8000/embed \
  -H "Content-Type: application/json" \
  -d '{
    "texts":["Test text"],
    "config":{
      "model":{"preset":{"name":"fast"}},
      "batch_size":32
    }
  }'

Response Schema:

{
  "embeddings": [
    [0.123, -0.456, 0.789, ...],  // 384 or 768 or 1024 dimensions
    [-0.234, 0.567, -0.891, ...]
  ],
  "model": "balanced",
  "dimensions": 768,
  "count": 2
}

Available Embedding Presets:

Preset	Model	Dimensions	Use Case
`fast`	AllMiniLML6V2Q	384	Quick prototyping, development
`balanced`	BGEBaseENV15	768	General-purpose RAG, production (default)
`quality`	BGELargeENV15	1024	Complex documents, maximum accuracy
`multilingual`	MultilingualE5Base	768	International documents, 100+ languages

Use Cases:

Generate embeddings for semantic search
Create vector representations for RAG (Retrieval-Augmented Generation) pipelines
Embed text chunks without extracting from documents
Batch embed multiple texts efficiently

Note: This endpoint requires the embeddings feature to be enabled (available in Docker images and most pre-built binaries). ONNX Runtime must be installed on the system.

GET /health

Health check endpoint for monitoring and load balancers.

Example:

# Check server health status
curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "version": "4.0.0-rc.1"
}

GET /info

Server information and capabilities.

Example:

# Get server version and capabilities
curl http://localhost:8000/info

Response:

{
  "version": "4.0.0-rc.1",
  "rust_backend": true
}

GET /cache/stats

Get cache statistics.

Example:

# Retrieve cache statistics and storage usage
curl http://localhost:8000/cache/stats

Response:

{
  "directory": ".kreuzberg",
  "total_files": 42,
  "total_size_mb": 156.8,
  "available_space_mb": 45123.5,
  "oldest_file_age_days": 7.2,
  "newest_file_age_days": 0.1
}

DELETE /cache/clear

Clear all cached files.

Example:

# Clear all cached extraction results
curl -X DELETE http://localhost:8000/cache/clear

Response:

{
  "directory": ".kreuzberg",
  "removed_files": 42,
  "freed_mb": 156.8
}

Configuration

Configuration File Discovery

The server automatically discovers configuration files in this order:

./kreuzberg.toml (current directory)
./kreuzberg.yaml
./kreuzberg.json
Parent directories (recursive search)
Default configuration (if no file found)

Example kreuzberg.toml:

[ocr]
backend = "tesseract"
language = "eng"

# Enable quality processing and caching
enable_quality_processing = true
use_cache = true

# Configure token reduction for LLM optimization
[token_reduction]
enabled = true
target_reduction = 0.3

See Configuration Guide for all options.

Environment Variables

Upload Limits:

# Set maximum file upload size in megabytes
KREUZBERG_MAX_UPLOAD_SIZE_MB=200  # Max upload size in MB (default: 100)

For detailed configuration options, memory considerations, and performance tuning for large files, see the File Size Limits Reference.

CORS Configuration:

# Configure allowed origins for cross-origin requests (production security)
KREUZBERG_CORS_ORIGINS="https://app.example.com,https://api.example.com"

Security Warning: The default CORS configuration allows all origins for development convenience. This permits CSRF attacks. Always set KREUZBERG_CORS_ORIGINS in production.

Note: Server host and port are configured via CLI flags (-H / --host and -p / --port), not environment variables.

Client Examples

=== "C#"

--8<-- "snippets/csharp/client_extract_single_file.md"

=== "cURL"

```bash title="Terminal"
# Extract content from a single document
curl -F "files=@document.pdf" http://localhost:8000/extract | jq .

# Extract with OCR enabled for scanned documents
curl -F "files=@scanned.pdf" \
     -F 'config={"ocr":{"language":"eng"}}' \
     http://localhost:8000/extract | jq .

# Batch extract multiple files in parallel
curl -F "files=@doc1.pdf" \
     -F "files=@doc2.docx" \
     http://localhost:8000/extract | jq .
```

=== "Go"

--8<-- "snippets/go/api/client_extract_single_file.md"

=== "Java"

--8<-- "snippets/java/api/client_extract_single_file.md"

=== "Python"

--8<-- "snippets/python/api/client_extract_single_file.md"

=== "Ruby"

--8<-- "snippets/ruby/api/client_extract_single_file.md"

=== "Rust"

--8<-- "snippets/rust/api/client_extract_single_file.md"

=== "TypeScript"

--8<-- "snippets/typescript/getting-started/client_extract_single_file.md"

Error Handling

Error Response Format:

{
  "error_type": "ValidationError",
  "message": "Invalid file format",
  "traceback": "...",
  "status_code": 400
}

HTTP Status Codes:

Status Code	Error Type	Meaning
400	`ValidationError`	Invalid input parameters
422	`ParsingError`, `OcrError`	Document processing failed
500	Internal errors	Server errors

Example:

=== "C#"

--8<-- "snippets/csharp/error_handling_extract.md"

=== "Go"

--8<-- "snippets/go/api/error_handling_extract.md"

=== "Java"

--8<-- "snippets/java/api/error_handling_extract.md"

=== "Python"

--8<-- "snippets/python/utils/error_handling_extract.md"

=== "Ruby"

--8<-- "snippets/ruby/api/error_handling_extract.md"

=== "Rust"

--8<-- "snippets/rust/api/error_handling_extract.md"

=== "TypeScript"

--8<-- "snippets/typescript/api/error_handling_extract.md"

MCP Server

The Model Context Protocol (MCP) server exposes Kreuzberg as tools for AI agents and assistants.

Starting the MCP Server

=== "CLI"

```bash title="Terminal"
# Start MCP server using stdio transport for AI agents
kreuzberg mcp

# Start MCP server with custom configuration file
kreuzberg mcp --config kreuzberg.toml
```

=== "C#"

--8<-- "snippets/csharp/mcp_server_start.md"

=== "Go"

--8<-- "snippets/go/mcp/mcp_server_start.md"

=== "Java"

--8<-- "snippets/java/mcp/mcp_server_start.md"

=== "Python"

--8<-- "snippets/python/mcp/mcp_server_start.md"

=== "Ruby"

--8<-- "snippets/ruby/mcp/mcp_server_start.md"

=== "Rust"

--8<-- "snippets/rust/mcp/mcp_server_start.md"

=== "TypeScript"

--8<-- "snippets/typescript/mcp/mcp_server_start.md"

MCP Tools

The MCP server exposes 6 tools for AI agents:

extract_file

Extract content from a file path.

Parameters:

Parameter	Type	Required	Description
`path`	string	Yes	File path to extract
`mime_type`	string	No	MIME type hint
`enable_ocr`	boolean	No	Enable OCR (default: false)
`force_ocr`	boolean	No	Force OCR even if text exists (default: false)
`async`	boolean	No	Use async extraction (default: true)

Example MCP Request:

{
  "method": "tools/call",
  "params": {
    "name": "extract_file",
    "arguments": {
      "path": "/path/to/document.pdf",
      "enable_ocr": true,
      "async": true
    }
  }
}

extract_bytes

Extract content from base64-encoded file data.

Parameters:

Parameter	Type	Required	Description
`data`	string	Yes	Base64-encoded file content
`mime_type`	string	No	MIME type hint
`enable_ocr`	boolean	No	Enable OCR
`force_ocr`	boolean	No	Force OCR
`async`	boolean	No	Use async extraction

batch_extract_files

Extract multiple files in parallel.

Parameters:

Parameter	Type	Required	Description
`paths`	array[string]	Yes	File paths to extract
`enable_ocr`	boolean	No	Enable OCR
`force_ocr`	boolean	No	Force OCR
`async`	boolean	No	Use async extraction

detect_mime_type

Detect file format and return MIME type.

Parameters:

Parameter	Type	Required	Description
`path`	string	Yes	File path
`use_content`	boolean	No	Content-based detection (default: true)

cache_stats

Get cache statistics.

Parameters: None

Returns: Cache directory path, file count, size, available space, file ages

cache_clear

Clear all cached files.

Parameters: None

Returns: Number of files removed, space freed

MCP Server Information

Server Metadata:

Name: kreuzberg-mcp
Title: Kreuzberg Document Intelligence MCP Server
Version: Current package version
Website: https://goldziher.github.io/kreuzberg/
Protocol: MCP (Model Context Protocol)
Transport: stdio (stdin/stdout)

Capabilities:

Tool calling (6 tools exposed)
Async and sync extraction variants
Base64-encoded file handling
Batch processing

AI Agent Integration

=== "Claude Desktop"

Add to Claude Desktop configuration (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):

```json title="claude_desktop_config.json"
{
  "mcpServers": {
    "kreuzberg": {
      "command": "kreuzberg",
      "args": ["mcp"]
    }
  }
}
```

After adding the configuration, restart Claude Desktop to load the Kreuzberg MCP server.

=== "C#"

--8<-- "snippets/csharp/mcp_custom_client.md"

=== "Go"

--8<-- "snippets/go/mcp/mcp_custom_client.md"

=== "Java"

--8<-- "snippets/java/mcp/mcp_client.md"

=== "LangChain"

--8<-- "snippets/python/mcp/mcp_langchain_integration.md"

=== "Python"

--8<-- "snippets/python/mcp/mcp_custom_client.md"

=== "Ruby"

--8<-- "snippets/ruby/mcp/mcp_custom_client.md"

=== "Rust"

--8<-- "snippets/rust/mcp/mcp_custom_client.md"

=== "TypeScript"

--8<-- "snippets/typescript/mcp/mcp_custom_client.md"

Production Deployment

Docker Deployment

Docker Compose Example:

version: '3.8'

services:
  kreuzberg-api:
    image: ghcr.io/kreuzberg-dev/kreuzberg:latest
    ports:
      - "8000:8000"
    environment:
      # Configure CORS for production security
      - KREUZBERG_CORS_ORIGINS=https://myapp.com,https://api.myapp.com
      # Set maximum upload size for large documents
      - KREUZBERG_MAX_UPLOAD_SIZE_MB=500
    volumes:
      # Mount configuration and cache directories
      - ./config:/config
      - ./cache:/app/.kreuzberg
    command: ["kreuzberg", "serve", "-H", "0.0.0.0", "-p", "8000", "--config", "/config/kreuzberg.toml"]
    restart: unless-stopped
    healthcheck:
      # Health check for container orchestration
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

Run:

# Start the Kreuzberg API server in detached mode
docker-compose up -d

Kubernetes Deployment

Deployment Manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kreuzberg-api
spec:
  replicas: 3  # Deploy 3 replicas for high availability
  selector:
    matchLabels:
      app: kreuzberg-api
  template:
    metadata:
      labels:
        app: kreuzberg-api
    spec:
      containers:
      - name: kreuzberg
        image: ghcr.io/kreuzberg-dev/kreuzberg:latest
        ports:
        - containerPort: 8000
        env:
        # Production environment configuration
        - name: KREUZBERG_CORS_ORIGINS
          value: "https://myapp.com"
        - name: KREUZBERG_MAX_UPLOAD_SIZE_MB
          value: "500"
        command: ["kreuzberg", "serve", "-H", "0.0.0.0", "-p", "8000"]
        livenessProbe:
          # Check if container is alive and healthy
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 30
        readinessProbe:
          # Check if container is ready to accept traffic
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 10
        resources:
          # Resource limits for optimal performance
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
---
apiVersion: v1
kind: Service
metadata:
  name: kreuzberg-api
spec:
  selector:
    app: kreuzberg-api
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer  # Expose service via load balancer

Reverse Proxy Configuration

Nginx:

# Load balance across multiple Kreuzberg instances
upstream kreuzberg {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
}

server {
    listen 443 ssl http2;
    server_name api.example.com;

    # SSL/TLS configuration
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    # Increase upload size limit for large documents
    client_max_body_size 500M;

    location / {
        proxy_pass http://kreuzberg;
        # Forward client headers
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Extended timeouts for large file processing
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
    }

    location /health {
        proxy_pass http://kreuzberg;
        access_log off;  # Disable logging for health checks
    }
}

Caddy:

api.example.com {
    # Load balance with automatic health checks
    reverse_proxy localhost:8000 localhost:8001 localhost:8002 {
        lb_policy round_robin
        health_uri /health
        health_interval 10s
    }

    # Increase maximum upload size for large documents
    request_body {
        max_size 500MB
    }
}

Production Checklist

Set KREUZBERG_CORS_ORIGINS to explicit allowed origins
Configure KREUZBERG_MAX_UPLOAD_SIZE_MB based on expected document sizes
Use reverse proxy (Nginx/Caddy) for SSL/TLS termination
Enable logging via RUST_LOG=info environment variable
Set up health checks on /health endpoint
Monitor cache size and set up periodic clearing
Use 0.0.0.0 binding for containerized deployments
Configure resource limits (CPU, memory) in container orchestration
Test with large files to validate upload limits and timeouts
Implement rate limiting at reverse proxy level
Set up monitoring (Prometheus metrics, logs aggregation)
Plan for horizontal scaling with load balancing

Monitoring

Health Check Endpoint:

# Simple health check for manual verification
curl http://localhost:8000/health

# Continuous monitoring script for production
#!/bin/bash
while true; do
  if curl -f http://localhost:8000/health > /dev/null 2>&1; then
    echo "$(date): Server healthy"
  else
    echo "$(date): Server unhealthy"
    # Send alert to monitoring system
  fi
  sleep 30
done

Cache Monitoring:

# Retrieve cache statistics and usage metrics
curl http://localhost:8000/cache/stats | jq .

# Automatic cache clearing when size exceeds threshold
CACHE_SIZE=$(curl -s http://localhost:8000/cache/stats | jq .total_size_mb)
if (( $(echo "$CACHE_SIZE > 1000" | bc -l) )); then
  curl -X DELETE http://localhost:8000/cache/clear
fi

Logging:

# Run with debug logging for development and troubleshooting
RUST_LOG=debug kreuzberg serve -H 0.0.0.0 -p 8000

# Production logging with info level (recommended)
RUST_LOG=info kreuzberg serve -H 0.0.0.0 -p 8000

# JSON structured logging for log aggregation systems
RUST_LOG=info RUST_LOG_FORMAT=json kreuzberg serve -H 0.0.0.0 -p 8000

Performance Tuning

Upload Size Limits

Configure based on expected document sizes:

# Configuration for small documents (PDFs, images under 10 MB)
export KREUZBERG_MAX_UPLOAD_SIZE_MB=50

# Configuration for typical business documents (under 50 MB)
export KREUZBERG_MAX_UPLOAD_SIZE_MB=200

# Configuration for large scans, archives, and high-resolution images
export KREUZBERG_MAX_UPLOAD_SIZE_MB=1000

See the File Size Limits Reference for comprehensive documentation including:

Memory impact calculations
Reverse proxy configuration
Error handling and troubleshooting
Client-side validation examples
Best practices for large file processing

Concurrent Requests

The server handles concurrent requests efficiently using Tokio's async runtime. For high-throughput scenarios:

Run multiple instances behind a load balancer
Configure reverse proxy connection pooling
Monitor CPU and memory usage to determine optimal replica count

Cache Strategy

Configure cache behavior via kreuzberg.toml:

use_cache = true
cache_dir = "/var/cache/kreuzberg"  # Custom cache location for production

Cache clearing strategies:

# Periodic cache clearing via cron job (daily at 2 AM)
0 2 * * * curl -X DELETE http://localhost:8000/cache/clear

# Size-based cache clearing when threshold is exceeded
CACHE_SIZE=$(curl -s http://localhost:8000/cache/stats | jq .total_size_mb)
if [ "$CACHE_SIZE" -gt 1000 ]; then
  curl -X DELETE http://localhost:8000/cache/clear
fi

Next Steps

Configuration Guide - Detailed configuration options
CLI Usage - Command-line interface
Advanced Features - Chunking, language detection, token reduction
Plugin Development - Extend Kreuzberg functionality

FilesExpand file tree

api-server.md

Latest commit

History

api-server.md

File metadata and controls

API Server

Server Types

HTTP REST API Server

MCP Server

HTTP REST API

Starting the Server

API Endpoints

POST /extract

POST /embed

GET /health

GET /info

GET /cache/stats

DELETE /cache/clear

Configuration

Configuration File Discovery

Environment Variables

Client Examples

Error Handling

MCP Server

Starting the MCP Server

MCP Tools

extract_file

extract_bytes

batch_extract_files

detect_mime_type

cache_stats

cache_clear

MCP Server Information

AI Agent Integration

Production Deployment

Docker Deployment

Kubernetes Deployment

Reverse Proxy Configuration

Production Checklist

Monitoring

Performance Tuning

Upload Size Limits

Concurrent Requests

Cache Strategy

Next Steps