-
Notifications
You must be signed in to change notification settings - Fork 168
Description
🌐 FEATURE: Configurable Well-Known URI Handler
Summary: Implement a flexible /.well-known/*
endpoint handler that supports standard well-known URIs like security.txt
and robots.txt
with user-configurable content. Defaults assume private API deployment with crawling disabled.
Implementation
1. Update config.py
with Well-Known Configuration
# In mcpgateway/config.py
from typing import Dict, Optional
import json
class Settings(BaseSettings):
# ... existing settings ...
# ===================================
# Well-Known URI Configuration
# ===================================
# Enable well-known URI endpoints
well_known_enabled: bool = True
# robots.txt content (default: disallow all crawling for private API)
well_known_robots_txt: str = """User-agent: *
Disallow: /
# MCP Gateway is a private API gateway
# Public crawling is disabled by default"""
# security.txt content (optional, user-defined)
# Example: "Contact: [email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en"
well_known_security_txt: str = ""
# Enable security.txt only if content is provided
well_known_security_txt_enabled: bool = False
# Additional custom well-known files (JSON format)
# Example: {"ai.txt": "This service uses AI for...", "dnt-policy.txt": "Do Not Track policy..."}
well_known_custom_files: str = "{}"
# Cache control for well-known files (seconds)
well_known_cache_max_age: int = 3600 # 1 hour default
@property
def custom_well_known_files(self) -> Dict[str, str]:
"""Parse custom well-known files from JSON string."""
try:
return json.loads(self.well_known_custom_files) if self.well_known_custom_files else {}
except json.JSONDecodeError:
logger.error(f"Invalid JSON in WELL_KNOWN_CUSTOM_FILES: {self.well_known_custom_files}")
return {}
@field_validator("well_known_security_txt_enabled", mode="after")
@classmethod
def _auto_enable_security_txt(cls, v, info):
"""Auto-enable security.txt if content is provided."""
if "well_known_security_txt" in info.data:
return bool(info.data["well_known_security_txt"].strip())
return v
2. Create Well-Known Handler Router
# Create mcpgateway/routers/well_known.py
from datetime import datetime
from typing import Optional
from fastapi import APIRouter, HTTPException, Response
from fastapi.responses import PlainTextResponse
from mcpgateway.config import settings
router = APIRouter(tags=["well-known"])
# Well-known URI registry with validation
WELL_KNOWN_REGISTRY = {
"robots.txt": {
"content_type": "text/plain",
"description": "Robot exclusion standard",
"rfc": "RFC 9309"
},
"security.txt": {
"content_type": "text/plain",
"description": "Security contact information",
"rfc": "RFC 9116"
},
"ai.txt": {
"content_type": "text/plain",
"description": "AI usage policies",
"rfc": "Draft"
},
"dnt-policy.txt": {
"content_type": "text/plain",
"description": "Do Not Track policy",
"rfc": "W3C"
},
"change-password": {
"content_type": "text/plain",
"description": "Change password URL",
"rfc": "RFC 8615"
}
}
def validate_security_txt(content: str) -> Optional[str]:
"""Validate security.txt format and add headers if missing."""
if not content:
return None
lines = content.strip().split('\n')
# Check if Expires field exists
has_expires = any(line.strip().startswith('Expires:') for line in lines)
# Add Expires field if missing (6 months from now)
if not has_expires:
expires = datetime.utcnow().replace(microsecond=0)
expires = expires.replace(month=(expires.month + 6) % 12 or 12)
lines.append(f"Expires: {expires.isoformat()}Z")
# Ensure it starts with required headers
validated = []
# Add header comment if not present
if not lines[0].startswith('#'):
validated.append("# Security contact information for MCP Gateway")
validated.append(f"# Generated: {datetime.utcnow().replace(microsecond=0).isoformat()}Z")
validated.append("")
validated.extend(lines)
return '\n'.join(validated)
@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response):
"""
Serve well-known URI files.
Supports:
- robots.txt: Robot exclusion (default: disallow all)
- security.txt: Security contact information (if configured)
- Custom files: Additional well-known files via configuration
Args:
filename: The well-known filename requested
response: FastAPI response object for headers
Returns:
Plain text content of the requested file
Raises:
HTTPException: 404 if file not found or well-known disabled
"""
if not settings.well_known_enabled:
raise HTTPException(status_code=404, detail="Not found")
# Normalize filename (remove any leading slashes)
filename = filename.strip('/')
# Set cache headers
response.headers["Cache-Control"] = f"public, max-age={settings.well_known_cache_max_age}"
# Handle robots.txt
if filename == "robots.txt":
response.headers["X-Robots-Tag"] = "noindex, nofollow"
return PlainTextResponse(
content=settings.well_known_robots_txt,
media_type="text/plain; charset=utf-8"
)
# Handle security.txt
elif filename == "security.txt":
if not settings.well_known_security_txt_enabled:
raise HTTPException(status_code=404, detail="security.txt not configured")
content = validate_security_txt(settings.well_known_security_txt)
if not content:
raise HTTPException(status_code=404, detail="security.txt not configured")
return PlainTextResponse(
content=content,
media_type="text/plain; charset=utf-8"
)
# Handle custom files
elif filename in settings.custom_well_known_files:
content = settings.custom_well_known_files[filename]
# Determine content type
content_type = "text/plain; charset=utf-8"
if filename in WELL_KNOWN_REGISTRY:
content_type = f"{WELL_KNOWN_REGISTRY[filename]['content_type']}; charset=utf-8"
return PlainTextResponse(
content=content,
media_type=content_type
)
# File not found
else:
# Provide helpful error for known well-known URIs
if filename in WELL_KNOWN_REGISTRY:
raise HTTPException(
status_code=404,
detail=f"{filename} is not configured. "
f"This is a {WELL_KNOWN_REGISTRY[filename]['description']} file."
)
else:
raise HTTPException(status_code=404, detail="Not found")
@router.get("/admin/well-known", response_model=dict)
async def get_well_known_status(user: str = Depends(require_auth)):
"""
Get status of well-known URI configuration.
Returns current configuration and available well-known files.
"""
configured_files = []
# Always available
configured_files.append({
"path": "/.well-known/robots.txt",
"enabled": True,
"description": "Robot exclusion standard",
"cache_max_age": settings.well_known_cache_max_age
})
# Conditionally available
if settings.well_known_security_txt_enabled:
configured_files.append({
"path": "/.well-known/security.txt",
"enabled": True,
"description": "Security contact information",
"cache_max_age": settings.well_known_cache_max_age
})
# Custom files
for filename in settings.custom_well_known_files:
configured_files.append({
"path": f"/.well-known/{filename}",
"enabled": True,
"description": "Custom well-known file",
"cache_max_age": settings.well_known_cache_max_age
})
return {
"enabled": settings.well_known_enabled,
"configured_files": configured_files,
"supported_files": list(WELL_KNOWN_REGISTRY.keys()),
"cache_max_age": settings.well_known_cache_max_age
}
3. Update main.py
to Include Router
# In mcpgateway/main.py
from mcpgateway.routers import well_known
# Include the well-known router (no prefix needed since paths start with /.well-known)
app.include_router(well_known.router)
4. Update .env.example
#####################################
# Well-Known URI Configuration
#####################################
# Enable well-known URI endpoints (/.well-known/*)
WELL_KNOWN_ENABLED=true
# robots.txt content - Default blocks all crawlers (private API)
# Use multiline with proper escaping or keep on one line
WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /\n\n# MCP Gateway is a private API gateway\n# Public crawling is disabled by default"
# security.txt content - Define your security contact information
# Format: RFC 9116 (https://www.rfc-editor.org/rfc/rfc9116.html)
# Leave empty to disable security.txt
# Example:
# WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en\nCanonical: https://example.com/.well-known/security.txt"
WELL_KNOWN_SECURITY_TXT=""
# Additional custom well-known files (JSON format)
# Example: {"ai.txt": "AI Usage: This service uses AI for tool orchestration...", "dnt-policy.txt": "We respect DNT headers..."}
WELL_KNOWN_CUSTOM_FILES={}
# Cache control for well-known files (seconds)
WELL_KNOWN_CACHE_MAX_AGE=3600 # 1 hour
5. Add Tests
# In tests/test_well_known.py
import pytest
from fastapi.testclient import TestClient
def test_robots_txt_default(client: TestClient):
"""Test default robots.txt blocks all crawlers."""
response = client.get("/.well-known/robots.txt")
assert response.status_code == 200
assert "User-agent: *" in response.text
assert "Disallow: /" in response.text
assert response.headers["content-type"] == "text/plain; charset=utf-8"
assert "Cache-Control" in response.headers
def test_security_txt_not_configured(client: TestClient):
"""Test security.txt returns 404 when not configured."""
response = client.get("/.well-known/security.txt")
assert response.status_code == 404
def test_security_txt_configured(client: TestClient, monkeypatch):
"""Test security.txt when configured."""
monkeypatch.setenv("WELL_KNOWN_SECURITY_TXT", "Contact: [email protected]")
# Reinitialize settings
from mcpgateway.config import settings
settings.well_known_security_txt = "Contact: [email protected]"
settings.well_known_security_txt_enabled = True
response = client.get("/.well-known/security.txt")
assert response.status_code == 200
assert "Contact: [email protected]" in response.text
assert "Expires:" in response.text # Auto-added
def test_custom_well_known_file(client: TestClient, monkeypatch):
"""Test custom well-known files."""
monkeypatch.setenv("WELL_KNOWN_CUSTOM_FILES", '{"ai.txt": "AI Policy: We use AI responsibly"}')
# Reinitialize settings
from mcpgateway.config import settings
settings.well_known_custom_files = '{"ai.txt": "AI Policy: We use AI responsibly"}'
response = client.get("/.well-known/ai.txt")
assert response.status_code == 200
assert "AI Policy: We use AI responsibly" in response.text
def test_unknown_well_known_file(client: TestClient):
"""Test unknown well-known file returns 404."""
response = client.get("/.well-known/unknown.txt")
assert response.status_code == 404
def test_well_known_disabled(client: TestClient, monkeypatch):
"""Test well-known endpoints when disabled."""
monkeypatch.setenv("WELL_KNOWN_ENABLED", "false")
from mcpgateway.config import settings
settings.well_known_enabled = False
response = client.get("/.well-known/robots.txt")
assert response.status_code == 404
def test_well_known_admin_status(client: TestClient, auth_headers):
"""Test admin status endpoint."""
response = client.get("/admin/well-known", headers=auth_headers)
assert response.status_code == 200
data = response.json()
assert data["enabled"] is True
assert any(f["path"] == "/.well-known/robots.txt" for f in data["configured_files"])
6. Example Configurations
# Example 1: Basic security.txt
WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]
Contact: https://mycompany.com/security
Encryption: https://mycompany.com/pgp-key.txt
Preferred-Languages: en, es
Canonical: https://api.mycompany.com/.well-known/security.txt"
# Example 2: Custom AI policy
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "# AI Usage Policy\n\nThis MCP Gateway uses AI for:\n- Tool orchestration\n- Response generation\n- Error handling\n\nWe do not use AI for:\n- User data analysis\n- Behavioral tracking\n- Decision making without human oversight"}
# Example 3: Allow specific crawlers
WELL_KNOWN_ROBOTS_TXT="User-agent: internal-monitor
Allow: /health
Allow: /metrics
User-agent: *
Disallow: /"
# Example 4: Multiple custom files
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "# AI Usage Policy\n\nThis MCP Gateway uses AI for:\n- Tool orchestration\n- Response generation\n- Error handling\n\nWe do not use AI for:\n- User data analysis\n- Behavioral tracking\n- Decision making without human oversight", "dnt-policy.txt": "# Do Not Track Policy\n\nWe respect the DNT header.\nNo tracking cookies are used.\nOnly essential session data is stored.", "change-password": "https://mycompany.com/account/password"}
Usage Examples
1. Basic Setup (Private API)
# Default configuration blocks all crawlers
curl https://api.example.com/.well-known/robots.txt
# Returns:
# User-agent: *
# Disallow: /
#
# MCP Gateway is a private API gateway
# Public crawling is disabled by default
2. Security Contact Configuration
# Configure security contact
export WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]
Contact: https://example.com/security
Acknowledgments: https://example.com/security/thanks
Preferred-Languages: en, fr, es
Hiring: https://example.com/careers"
# Access security.txt
curl https://api.example.com/.well-known/security.txt
# Returns formatted security.txt with auto-generated Expires header
3. AI Usage Policy
# Configure AI policy
export WELL_KNOWN_CUSTOM_FILES='{"ai.txt": "# AI Usage Policy\n\nAI Model: Tool orchestration only\nData Retention: No training on user data\nHuman Oversight: Required for all operations"}'
# Access AI policy
curl https://api.example.com/.well-known/ai.txt
4. Admin Monitoring
# Check well-known configuration status
curl -H "Authorization: Bearer $API_KEY" \
https://api.example.com/admin/well-known
# Returns:
{
"enabled": true,
"configured_files": [
{
"path": "/.well-known/robots.txt",
"enabled": true,
"description": "Robot exclusion standard",
"cache_max_age": 3600
},
{
"path": "/.well-known/security.txt",
"enabled": true,
"description": "Security contact information",
"cache_max_age": 3600
}
],
"supported_files": [
"robots.txt",
"security.txt",
"ai.txt",
"dnt-policy.txt",
"change-password"
],
"cache_max_age": 3600
}
Security Considerations
- Content Validation: The security.txt validator ensures proper format and adds required fields
- Cache Headers: Configurable cache control prevents excessive requests
- Path Traversal Protection: Filename normalization prevents directory traversal
- Admin-Only Status: Configuration status requires authentication
- No Dynamic Content: All content is statically configured via environment variables
Deployment Guide
Docker Deployment
# In your Docker environment
ENV WELL_KNOWN_ENABLED=true
ENV WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /api/\nAllow: /api/health"
ENV WELL_KNOWN_SECURITY_TXT="Contact: [email protected]\nExpires: 2025-12-31T23:59:59Z"
ENV WELL_KNOWN_CUSTOM_FILES='{"ai.txt": "AI Policy: Responsible use only"}'
ENV WELL_KNOWN_CACHE_MAX_AGE=3600
Kubernetes ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: mcp-gateway-wellknown
data:
WELL_KNOWN_ENABLED: "true"
WELL_KNOWN_ROBOTS_TXT: |
User-agent: *
Disallow: /
# Private API - No public crawling
WELL_KNOWN_SECURITY_TXT: |
Contact: mailto:[email protected]
Expires: 2025-12-31T23:59:59Z
Preferred-Languages: en
WELL_KNOWN_CUSTOM_FILES: |
{
"ai.txt": "This service uses AI for tool orchestration only.",
"dnt-policy.txt": "We honor Do Not Track headers."
}
Docker Compose
services:
mcp-gateway:
environment:
WELL_KNOWN_ENABLED: "true"
WELL_KNOWN_ROBOTS_TXT: |
User-agent: monitoring-bot
Allow: /health
User-agent: *
Disallow: /
WELL_KNOWN_SECURITY_TXT: |
Contact: [email protected]
Encryption: https://example.com/pgp
WELL_KNOWN_CUSTOM_FILES: '{"ai.txt": "AI is used for tool orchestration"}'
WELL_KNOWN_CACHE_MAX_AGE: "7200"
Monitoring and Observability
Prometheus Metrics
Add metrics to track well-known URI usage:
# In well_known.py
from prometheus_client import Counter, Histogram
well_known_requests = Counter(
'mcp_gateway_well_known_requests_total',
'Total well-known URI requests',
['filename', 'status']
)
well_known_request_duration = Histogram(
'mcp_gateway_well_known_request_duration_seconds',
'Well-known URI request duration',
['filename']
)
# In the handler
@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response):
with well_known_request_duration.labels(filename=filename).time():
# ... existing logic ...
well_known_requests.labels(filename=filename, status="found").inc()
Logging
The feature includes structured logging for security monitoring:
# Log well-known access
logger.info(
"Well-known URI accessed",
extra={
"filename": filename,
"ip": request.client.host,
"user_agent": request.headers.get("user-agent"),
"cache_hit": False
}
)
Testing Checklist
- Default robots.txt blocks all crawlers
- Security.txt auto-generates Expires header
- Custom files are served with correct content-type
- Unknown files return 404
- Cache headers are properly set
- Path traversal attempts are blocked
- Admin status endpoint requires authentication
- Disabled well-known returns 404 for all files
Future Enhancements
- Dynamic Content: Support for template variables (e.g.,
{{DOMAIN}}
,{{CONTACT_EMAIL}}
) - File Upload: Admin API to upload well-known files
- Signature Support: GPG signing for security.txt
- Rate Limiting: Specific limits for well-known endpoints
- A/B Testing: Serve different robots.txt based on user agent
- Internationalization: Multi-language support for policy files
FAQ
Q: Why disable crawling by default?
A: MCP Gateway is typically a private API gateway. Public crawling could expose API structure and endpoints.
Q: Can I serve HTML files?
A: The current implementation focuses on plain text files per well-known URI standards. HTML would require additional security considerations.
Q: How do I update well-known files?
A: Update environment variables and restart the service. For zero-downtime updates, use rolling deployments.
Q: Are there size limits?
A: Environment variable size limits apply (typically 32KB-1MB depending on platform). Large files should be served differently.
Q: Can I disable caching?
A: Set WELL_KNOWN_CACHE_MAX_AGE=0
to disable caching, though this increases server load.
References
- RFC 8615 - Well-Known Uniform Resource Identifiers (URIs)
- RFC 9309 - Robots Exclusion Protocol
- RFC 9116 - security.txt
- Well-Known URI Registry - IANA registry