Skip to content

[SECURITY FEATURE]: Configurable Well-Known URI Handler including security.txt and robots.txt #540

@crivetimihai

Description

@crivetimihai

🌐 FEATURE: Configurable Well-Known URI Handler

Summary: Implement a flexible /.well-known/* endpoint handler that supports standard well-known URIs like security.txt and robots.txt with user-configurable content. Defaults assume private API deployment with crawling disabled.

Implementation

1. Update config.py with Well-Known Configuration

# In mcpgateway/config.py

from typing import Dict, Optional
import json

class Settings(BaseSettings):
    # ... existing settings ...
    
    # ===================================
    # Well-Known URI Configuration
    # ===================================
    
    # Enable well-known URI endpoints
    well_known_enabled: bool = True
    
    # robots.txt content (default: disallow all crawling for private API)
    well_known_robots_txt: str = """User-agent: *
Disallow: /
    
# MCP Gateway is a private API gateway
# Public crawling is disabled by default"""
    
    # security.txt content (optional, user-defined)
    # Example: "Contact: [email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en"
    well_known_security_txt: str = ""
    
    # Enable security.txt only if content is provided
    well_known_security_txt_enabled: bool = False
    
    # Additional custom well-known files (JSON format)
    # Example: {"ai.txt": "This service uses AI for...", "dnt-policy.txt": "Do Not Track policy..."}
    well_known_custom_files: str = "{}"
    
    # Cache control for well-known files (seconds)
    well_known_cache_max_age: int = 3600  # 1 hour default
    
    @property
    def custom_well_known_files(self) -> Dict[str, str]:
        """Parse custom well-known files from JSON string."""
        try:
            return json.loads(self.well_known_custom_files) if self.well_known_custom_files else {}
        except json.JSONDecodeError:
            logger.error(f"Invalid JSON in WELL_KNOWN_CUSTOM_FILES: {self.well_known_custom_files}")
            return {}
    
    @field_validator("well_known_security_txt_enabled", mode="after")
    @classmethod
    def _auto_enable_security_txt(cls, v, info):
        """Auto-enable security.txt if content is provided."""
        if "well_known_security_txt" in info.data:
            return bool(info.data["well_known_security_txt"].strip())
        return v

2. Create Well-Known Handler Router

# Create mcpgateway/routers/well_known.py

from datetime import datetime
from typing import Optional

from fastapi import APIRouter, HTTPException, Response
from fastapi.responses import PlainTextResponse

from mcpgateway.config import settings

router = APIRouter(tags=["well-known"])

# Well-known URI registry with validation
WELL_KNOWN_REGISTRY = {
    "robots.txt": {
        "content_type": "text/plain",
        "description": "Robot exclusion standard",
        "rfc": "RFC 9309"
    },
    "security.txt": {
        "content_type": "text/plain", 
        "description": "Security contact information",
        "rfc": "RFC 9116"
    },
    "ai.txt": {
        "content_type": "text/plain",
        "description": "AI usage policies",
        "rfc": "Draft"
    },
    "dnt-policy.txt": {
        "content_type": "text/plain",
        "description": "Do Not Track policy", 
        "rfc": "W3C"
    },
    "change-password": {
        "content_type": "text/plain",
        "description": "Change password URL",
        "rfc": "RFC 8615"
    }
}


def validate_security_txt(content: str) -> Optional[str]:
    """Validate security.txt format and add headers if missing."""
    if not content:
        return None
    
    lines = content.strip().split('\n')
    
    # Check if Expires field exists
    has_expires = any(line.strip().startswith('Expires:') for line in lines)
    
    # Add Expires field if missing (6 months from now)
    if not has_expires:
        expires = datetime.utcnow().replace(microsecond=0)
        expires = expires.replace(month=(expires.month + 6) % 12 or 12)
        lines.append(f"Expires: {expires.isoformat()}Z")
    
    # Ensure it starts with required headers
    validated = []
    
    # Add header comment if not present
    if not lines[0].startswith('#'):
        validated.append("# Security contact information for MCP Gateway")
        validated.append(f"# Generated: {datetime.utcnow().replace(microsecond=0).isoformat()}Z")
        validated.append("")
    
    validated.extend(lines)
    
    return '\n'.join(validated)


@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response):
    """
    Serve well-known URI files.
    
    Supports:
    - robots.txt: Robot exclusion (default: disallow all)
    - security.txt: Security contact information (if configured)
    - Custom files: Additional well-known files via configuration
    
    Args:
        filename: The well-known filename requested
        response: FastAPI response object for headers
        
    Returns:
        Plain text content of the requested file
        
    Raises:
        HTTPException: 404 if file not found or well-known disabled
    """
    if not settings.well_known_enabled:
        raise HTTPException(status_code=404, detail="Not found")
    
    # Normalize filename (remove any leading slashes)
    filename = filename.strip('/')
    
    # Set cache headers
    response.headers["Cache-Control"] = f"public, max-age={settings.well_known_cache_max_age}"
    
    # Handle robots.txt
    if filename == "robots.txt":
        response.headers["X-Robots-Tag"] = "noindex, nofollow"
        return PlainTextResponse(
            content=settings.well_known_robots_txt,
            media_type="text/plain; charset=utf-8"
        )
    
    # Handle security.txt
    elif filename == "security.txt":
        if not settings.well_known_security_txt_enabled:
            raise HTTPException(status_code=404, detail="security.txt not configured")
        
        content = validate_security_txt(settings.well_known_security_txt)
        if not content:
            raise HTTPException(status_code=404, detail="security.txt not configured")
        
        return PlainTextResponse(
            content=content,
            media_type="text/plain; charset=utf-8"
        )
    
    # Handle custom files
    elif filename in settings.custom_well_known_files:
        content = settings.custom_well_known_files[filename]
        
        # Determine content type
        content_type = "text/plain; charset=utf-8"
        if filename in WELL_KNOWN_REGISTRY:
            content_type = f"{WELL_KNOWN_REGISTRY[filename]['content_type']}; charset=utf-8"
        
        return PlainTextResponse(
            content=content,
            media_type=content_type
        )
    
    # File not found
    else:
        # Provide helpful error for known well-known URIs
        if filename in WELL_KNOWN_REGISTRY:
            raise HTTPException(
                status_code=404,
                detail=f"{filename} is not configured. "
                       f"This is a {WELL_KNOWN_REGISTRY[filename]['description']} file."
            )
        else:
            raise HTTPException(status_code=404, detail="Not found")


@router.get("/admin/well-known", response_model=dict)
async def get_well_known_status(user: str = Depends(require_auth)):
    """
    Get status of well-known URI configuration.
    
    Returns current configuration and available well-known files.
    """
    configured_files = []
    
    # Always available
    configured_files.append({
        "path": "/.well-known/robots.txt",
        "enabled": True,
        "description": "Robot exclusion standard",
        "cache_max_age": settings.well_known_cache_max_age
    })
    
    # Conditionally available
    if settings.well_known_security_txt_enabled:
        configured_files.append({
            "path": "/.well-known/security.txt",
            "enabled": True,
            "description": "Security contact information",
            "cache_max_age": settings.well_known_cache_max_age
        })
    
    # Custom files
    for filename in settings.custom_well_known_files:
        configured_files.append({
            "path": f"/.well-known/{filename}",
            "enabled": True,
            "description": "Custom well-known file",
            "cache_max_age": settings.well_known_cache_max_age
        })
    
    return {
        "enabled": settings.well_known_enabled,
        "configured_files": configured_files,
        "supported_files": list(WELL_KNOWN_REGISTRY.keys()),
        "cache_max_age": settings.well_known_cache_max_age
    }

3. Update main.py to Include Router

# In mcpgateway/main.py

from mcpgateway.routers import well_known

# Include the well-known router (no prefix needed since paths start with /.well-known)
app.include_router(well_known.router)

4. Update .env.example

#####################################
# Well-Known URI Configuration
#####################################

# Enable well-known URI endpoints (/.well-known/*)
WELL_KNOWN_ENABLED=true

# robots.txt content - Default blocks all crawlers (private API)
# Use multiline with proper escaping or keep on one line
WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /\n\n# MCP Gateway is a private API gateway\n# Public crawling is disabled by default"

# security.txt content - Define your security contact information
# Format: RFC 9116 (https://www.rfc-editor.org/rfc/rfc9116.html)
# Leave empty to disable security.txt
# Example:
# WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en\nCanonical: https://example.com/.well-known/security.txt"
WELL_KNOWN_SECURITY_TXT=""

# Additional custom well-known files (JSON format)
# Example: {"ai.txt": "AI Usage: This service uses AI for tool orchestration...", "dnt-policy.txt": "We respect DNT headers..."}
WELL_KNOWN_CUSTOM_FILES={}

# Cache control for well-known files (seconds)
WELL_KNOWN_CACHE_MAX_AGE=3600  # 1 hour

5. Add Tests

# In tests/test_well_known.py

import pytest
from fastapi.testclient import TestClient

def test_robots_txt_default(client: TestClient):
    """Test default robots.txt blocks all crawlers."""
    response = client.get("/.well-known/robots.txt")
    assert response.status_code == 200
    assert "User-agent: *" in response.text
    assert "Disallow: /" in response.text
    assert response.headers["content-type"] == "text/plain; charset=utf-8"
    assert "Cache-Control" in response.headers

def test_security_txt_not_configured(client: TestClient):
    """Test security.txt returns 404 when not configured."""
    response = client.get("/.well-known/security.txt")
    assert response.status_code == 404

def test_security_txt_configured(client: TestClient, monkeypatch):
    """Test security.txt when configured."""
    monkeypatch.setenv("WELL_KNOWN_SECURITY_TXT", "Contact: [email protected]")
    # Reinitialize settings
    from mcpgateway.config import settings
    settings.well_known_security_txt = "Contact: [email protected]"
    settings.well_known_security_txt_enabled = True
    
    response = client.get("/.well-known/security.txt")
    assert response.status_code == 200
    assert "Contact: [email protected]" in response.text
    assert "Expires:" in response.text  # Auto-added

def test_custom_well_known_file(client: TestClient, monkeypatch):
    """Test custom well-known files."""
    monkeypatch.setenv("WELL_KNOWN_CUSTOM_FILES", '{"ai.txt": "AI Policy: We use AI responsibly"}')
    # Reinitialize settings
    from mcpgateway.config import settings
    settings.well_known_custom_files = '{"ai.txt": "AI Policy: We use AI responsibly"}'
    
    response = client.get("/.well-known/ai.txt")
    assert response.status_code == 200
    assert "AI Policy: We use AI responsibly" in response.text

def test_unknown_well_known_file(client: TestClient):
    """Test unknown well-known file returns 404."""
    response = client.get("/.well-known/unknown.txt")
    assert response.status_code == 404

def test_well_known_disabled(client: TestClient, monkeypatch):
    """Test well-known endpoints when disabled."""
    monkeypatch.setenv("WELL_KNOWN_ENABLED", "false")
    from mcpgateway.config import settings
    settings.well_known_enabled = False
    
    response = client.get("/.well-known/robots.txt")
    assert response.status_code == 404

def test_well_known_admin_status(client: TestClient, auth_headers):
    """Test admin status endpoint."""
    response = client.get("/admin/well-known", headers=auth_headers)
    assert response.status_code == 200
    data = response.json()
    assert data["enabled"] is True
    assert any(f["path"] == "/.well-known/robots.txt" for f in data["configured_files"])

6. Example Configurations

# Example 1: Basic security.txt
WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]
Contact: https://mycompany.com/security
Encryption: https://mycompany.com/pgp-key.txt
Preferred-Languages: en, es
Canonical: https://api.mycompany.com/.well-known/security.txt"

# Example 2: Custom AI policy
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "# AI Usage Policy\n\nThis MCP Gateway uses AI for:\n- Tool orchestration\n- Response generation\n- Error handling\n\nWe do not use AI for:\n- User data analysis\n- Behavioral tracking\n- Decision making without human oversight"}

# Example 3: Allow specific crawlers
WELL_KNOWN_ROBOTS_TXT="User-agent: internal-monitor
Allow: /health
Allow: /metrics

User-agent: *
Disallow: /"

# Example 4: Multiple custom files
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "# AI Usage Policy\n\nThis MCP Gateway uses AI for:\n- Tool orchestration\n- Response generation\n- Error handling\n\nWe do not use AI for:\n- User data analysis\n- Behavioral tracking\n- Decision making without human oversight", "dnt-policy.txt": "# Do Not Track Policy\n\nWe respect the DNT header.\nNo tracking cookies are used.\nOnly essential session data is stored.", "change-password": "https://mycompany.com/account/password"}

Usage Examples

1. Basic Setup (Private API)

# Default configuration blocks all crawlers
curl https://api.example.com/.well-known/robots.txt
# Returns:
# User-agent: *
# Disallow: /
# 
# MCP Gateway is a private API gateway
# Public crawling is disabled by default

2. Security Contact Configuration

# Configure security contact
export WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]
Contact: https://example.com/security
Acknowledgments: https://example.com/security/thanks
Preferred-Languages: en, fr, es
Hiring: https://example.com/careers"

# Access security.txt
curl https://api.example.com/.well-known/security.txt
# Returns formatted security.txt with auto-generated Expires header

3. AI Usage Policy

# Configure AI policy
export WELL_KNOWN_CUSTOM_FILES='{"ai.txt": "# AI Usage Policy\n\nAI Model: Tool orchestration only\nData Retention: No training on user data\nHuman Oversight: Required for all operations"}'

# Access AI policy
curl https://api.example.com/.well-known/ai.txt

4. Admin Monitoring

# Check well-known configuration status
curl -H "Authorization: Bearer $API_KEY" \
  https://api.example.com/admin/well-known

# Returns:
{
  "enabled": true,
  "configured_files": [
    {
      "path": "/.well-known/robots.txt",
      "enabled": true,
      "description": "Robot exclusion standard",
      "cache_max_age": 3600
    },
    {
      "path": "/.well-known/security.txt",
      "enabled": true,
      "description": "Security contact information",
      "cache_max_age": 3600
    }
  ],
  "supported_files": [
    "robots.txt",
    "security.txt",
    "ai.txt",
    "dnt-policy.txt",
    "change-password"
  ],
  "cache_max_age": 3600
}

Security Considerations

  1. Content Validation: The security.txt validator ensures proper format and adds required fields
  2. Cache Headers: Configurable cache control prevents excessive requests
  3. Path Traversal Protection: Filename normalization prevents directory traversal
  4. Admin-Only Status: Configuration status requires authentication
  5. No Dynamic Content: All content is statically configured via environment variables

Deployment Guide

Docker Deployment

# In your Docker environment
ENV WELL_KNOWN_ENABLED=true
ENV WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /api/\nAllow: /api/health"
ENV WELL_KNOWN_SECURITY_TXT="Contact: [email protected]\nExpires: 2025-12-31T23:59:59Z"
ENV WELL_KNOWN_CUSTOM_FILES='{"ai.txt": "AI Policy: Responsible use only"}'
ENV WELL_KNOWN_CACHE_MAX_AGE=3600

Kubernetes ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-gateway-wellknown
data:
  WELL_KNOWN_ENABLED: "true"
  WELL_KNOWN_ROBOTS_TXT: |
    User-agent: *
    Disallow: /
    
    # Private API - No public crawling
  WELL_KNOWN_SECURITY_TXT: |
    Contact: mailto:[email protected]
    Expires: 2025-12-31T23:59:59Z
    Preferred-Languages: en
  WELL_KNOWN_CUSTOM_FILES: |
    {
      "ai.txt": "This service uses AI for tool orchestration only.",
      "dnt-policy.txt": "We honor Do Not Track headers."
    }

Docker Compose

services:
  mcp-gateway:
    environment:
      WELL_KNOWN_ENABLED: "true"
      WELL_KNOWN_ROBOTS_TXT: |
        User-agent: monitoring-bot
        Allow: /health
        
        User-agent: *
        Disallow: /
      WELL_KNOWN_SECURITY_TXT: |
        Contact: [email protected]
        Encryption: https://example.com/pgp
      WELL_KNOWN_CUSTOM_FILES: '{"ai.txt": "AI is used for tool orchestration"}'
      WELL_KNOWN_CACHE_MAX_AGE: "7200"

Monitoring and Observability

Prometheus Metrics

Add metrics to track well-known URI usage:

# In well_known.py
from prometheus_client import Counter, Histogram

well_known_requests = Counter(
    'mcp_gateway_well_known_requests_total',
    'Total well-known URI requests',
    ['filename', 'status']
)

well_known_request_duration = Histogram(
    'mcp_gateway_well_known_request_duration_seconds',
    'Well-known URI request duration',
    ['filename']
)

# In the handler
@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response):
    with well_known_request_duration.labels(filename=filename).time():
        # ... existing logic ...
        well_known_requests.labels(filename=filename, status="found").inc()

Logging

The feature includes structured logging for security monitoring:

# Log well-known access
logger.info(
    "Well-known URI accessed",
    extra={
        "filename": filename,
        "ip": request.client.host,
        "user_agent": request.headers.get("user-agent"),
        "cache_hit": False
    }
)

Testing Checklist

  • Default robots.txt blocks all crawlers
  • Security.txt auto-generates Expires header
  • Custom files are served with correct content-type
  • Unknown files return 404
  • Cache headers are properly set
  • Path traversal attempts are blocked
  • Admin status endpoint requires authentication
  • Disabled well-known returns 404 for all files

Future Enhancements

  1. Dynamic Content: Support for template variables (e.g., {{DOMAIN}}, {{CONTACT_EMAIL}})
  2. File Upload: Admin API to upload well-known files
  3. Signature Support: GPG signing for security.txt
  4. Rate Limiting: Specific limits for well-known endpoints
  5. A/B Testing: Serve different robots.txt based on user agent
  6. Internationalization: Multi-language support for policy files

FAQ

Q: Why disable crawling by default?
A: MCP Gateway is typically a private API gateway. Public crawling could expose API structure and endpoints.

Q: Can I serve HTML files?
A: The current implementation focuses on plain text files per well-known URI standards. HTML would require additional security considerations.

Q: How do I update well-known files?
A: Update environment variables and restart the service. For zero-downtime updates, use rolling deployments.

Q: Are there size limits?
A: Environment variable size limits apply (typically 32KB-1MB depending on platform). Large files should be served differently.

Q: Can I disable caching?
A: Set WELL_KNOWN_CACHE_MAX_AGE=0 to disable caching, though this increases server load.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecurityImproves securitytriageIssues / Features awaiting triage

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions