|
| 1 | +# ADR-0015: Configurable Well-Known URI Handler |
| 2 | + |
| 3 | +- *Status:* Accepted |
| 4 | +- *Date:* 2025-08-17 |
| 5 | +- *Deciders:* Core Engineering Team |
| 6 | +- *Issues:* [#540](https://github.com/IBM/mcp-context-forge/issues/540) |
| 7 | +- *Related:* Security infrastructure for standardized web discovery |
| 8 | + |
| 9 | +## Context |
| 10 | + |
| 11 | +The MCP Gateway needed to support standardized well-known URIs as defined by RFC 8615 to enable proper web service discovery, security contact information, and crawler management. Well-known URIs are standardized endpoints that web services expose for automated discovery and security contact purposes. |
| 12 | + |
| 13 | +The implementation needed to address: |
| 14 | +- **robots.txt** for search engine crawler management (typically private API = disable crawling) |
| 15 | +- **security.txt** for security contact information per RFC 9116 |
| 16 | +- **Custom well-known files** for additional service policies (AI usage, DNT policy, etc.) |
| 17 | +- **Security-first defaults** appropriate for private API gateway deployment |
| 18 | +- **Configuration flexibility** for different deployment scenarios |
| 19 | +- **Admin monitoring** of well-known configuration status |
| 20 | + |
| 21 | +### Requirements |
| 22 | + |
| 23 | +- Support standard well-known URIs (robots.txt, security.txt) |
| 24 | +- Allow custom well-known files via configuration |
| 25 | +- Default to private API security posture (no crawling) |
| 26 | +- RFC 9116 compliant security.txt with automatic validation |
| 27 | +- Configurable cache headers for performance |
| 28 | +- Admin endpoint for configuration monitoring |
| 29 | +- Environment-based configuration via standard patterns |
| 30 | + |
| 31 | +## Decision |
| 32 | + |
| 33 | +We implemented a flexible `/.well-known/*` endpoint handler with the following design: |
| 34 | + |
| 35 | +### 1. Router-Based Implementation |
| 36 | + |
| 37 | +Created `mcpgateway/routers/well_known.py` with a dedicated FastAPI router: |
| 38 | + |
| 39 | +```python |
| 40 | +@router.get("/.well-known/{filename:path}", include_in_schema=False) |
| 41 | +async def get_well_known_file(filename: str, response: Response, request: Request): |
| 42 | + """Serve well-known URI files with configurable content and security defaults.""" |
| 43 | +``` |
| 44 | + |
| 45 | +**Design decisions:** |
| 46 | +- **Router isolation**: Separate router for clean organization and testing |
| 47 | +- **Dynamic routing**: Single endpoint handles all well-known URIs |
| 48 | +- **Security-first**: Disabled by default, explicit enable required |
| 49 | +- **Schema exclusion**: Not included in OpenAPI docs (reduces attack surface) |
| 50 | + |
| 51 | +### 2. Configuration-Driven Content |
| 52 | + |
| 53 | +Extended `mcpgateway/config.py` with well-known URI settings: |
| 54 | + |
| 55 | +```python |
| 56 | +# Well-Known URI Configuration |
| 57 | +well_known_enabled: bool = True |
| 58 | +well_known_robots_txt: str = """User-agent: * |
| 59 | +Disallow: / |
| 60 | +
|
| 61 | +# MCP Gateway is a private API gateway |
| 62 | +# Public crawling is disabled by default""" |
| 63 | + |
| 64 | +well_known_security_txt: str = "" |
| 65 | +well_known_security_txt_enabled: bool = False |
| 66 | +well_known_custom_files: str = "{}" # JSON format |
| 67 | +well_known_cache_max_age: int = 3600 # 1 hour |
| 68 | +``` |
| 69 | + |
| 70 | +**Design decisions:** |
| 71 | +- **Private API defaults**: robots.txt blocks all crawlers by default |
| 72 | +- **Explicit security.txt**: Only enabled when content is provided |
| 73 | +- **JSON custom files**: Flexible format for additional well-known files |
| 74 | +- **Configurable caching**: Performance optimization with sensible defaults |
| 75 | + |
| 76 | +### 3. RFC 9116 Security.txt Compliance |
| 77 | + |
| 78 | +Implemented automatic security.txt validation and enhancement: |
| 79 | + |
| 80 | +```python |
| 81 | +def validate_security_txt(content: str) -> Optional[str]: |
| 82 | + """Validate security.txt format and add required headers.""" |
| 83 | + # Add Expires field if missing (6 months from now) |
| 84 | + # Add header comments for clarity |
| 85 | + # Preserve existing valid content |
| 86 | +``` |
| 87 | + |
| 88 | +**Design decisions:** |
| 89 | +- **Auto-expires**: Adds Expires header if missing (RFC requirement) |
| 90 | +- **Header comments**: Adds generation timestamp and description |
| 91 | +- **Validation**: Ensures RFC 9116 compliance |
| 92 | +- **Preservation**: Maintains existing valid fields |
| 93 | + |
| 94 | +### 4. Well-Known Registry |
| 95 | + |
| 96 | +Implemented a registry for known well-known URIs with metadata: |
| 97 | + |
| 98 | +```python |
| 99 | +WELL_KNOWN_REGISTRY = { |
| 100 | + "robots.txt": { |
| 101 | + "content_type": "text/plain", |
| 102 | + "description": "Robot exclusion standard", |
| 103 | + "rfc": "RFC 9309" |
| 104 | + }, |
| 105 | + "security.txt": { |
| 106 | + "content_type": "text/plain", |
| 107 | + "description": "Security contact information", |
| 108 | + "rfc": "RFC 9116" |
| 109 | + }, |
| 110 | + # ... additional standard URIs |
| 111 | +} |
| 112 | +``` |
| 113 | + |
| 114 | +**Design decisions:** |
| 115 | +- **Helpful errors**: Provides descriptive 404 messages for known but unconfigured files |
| 116 | +- **Content-Type mapping**: Ensures correct MIME types |
| 117 | +- **Documentation**: Links to relevant RFCs and standards |
| 118 | +- **Extensibility**: Easy to add new standard well-known URIs |
| 119 | + |
| 120 | +### 5. Admin Monitoring |
| 121 | + |
| 122 | +Added `/admin/well-known` endpoint for configuration visibility: |
| 123 | + |
| 124 | +```python |
| 125 | +@router.get("/admin/well-known", response_model=dict) |
| 126 | +async def get_well_known_status(user: str = Depends(require_auth)): |
| 127 | + """Returns configuration status and available well-known files.""" |
| 128 | +``` |
| 129 | + |
| 130 | +**Design decisions:** |
| 131 | +- **Authentication required**: Admin endpoint requires JWT authentication |
| 132 | +- **Configuration visibility**: Shows enabled files and cache settings |
| 133 | +- **Supported files list**: Displays all known well-known URI types |
| 134 | +- **Status monitoring**: Helps administrators verify configuration |
| 135 | + |
| 136 | +## Implementation Architecture |
| 137 | + |
| 138 | +### File Structure |
| 139 | +``` |
| 140 | +mcpgateway/ |
| 141 | +├── config.py # Well-known configuration settings |
| 142 | +├── routers/ |
| 143 | +│ └── well_known.py # /.well-known/* endpoint handler |
| 144 | +└── main.py # Router integration |
| 145 | +
|
| 146 | +tests/unit/mcpgateway/ |
| 147 | +└── test_well_known.py # Comprehensive test coverage |
| 148 | +
|
| 149 | +.env.example # Configuration documentation |
| 150 | +``` |
| 151 | + |
| 152 | +### Request Flow |
| 153 | +1. **Request**: `GET /.well-known/robots.txt` |
| 154 | +2. **Authentication**: No auth required for well-known URIs (public by design) |
| 155 | +3. **Validation**: Check if well-known endpoints are enabled |
| 156 | +4. **Routing**: Match filename to configured content or registry |
| 157 | +5. **Headers**: Add cache control and specific headers (X-Robots-Tag for robots.txt) |
| 158 | +6. **Response**: Return PlainTextResponse with appropriate headers |
| 159 | + |
| 160 | +### Security Considerations |
| 161 | +- **No authentication**: Well-known URIs are public by design per RFC 8615 |
| 162 | +- **Content validation**: security.txt content validated against RFC 9116 |
| 163 | +- **Path traversal protection**: Filename normalization prevents directory traversal |
| 164 | +- **Cache headers**: Appropriate cache settings reduce server load |
| 165 | +- **Information disclosure**: Default robots.txt reveals minimal information |
| 166 | + |
| 167 | +## Consequences |
| 168 | + |
| 169 | +### ✅ Benefits |
| 170 | + |
| 171 | +- **Standards Compliance**: Implements RFC 8615 (well-known URIs) and RFC 9116 (security.txt) |
| 172 | +- **Security Contact**: Enables security researchers to find contact information |
| 173 | +- **Crawler Management**: Proper robots.txt prevents unwanted search engine indexing |
| 174 | +- **Flexibility**: Custom well-known files support organization-specific policies |
| 175 | +- **Performance**: Configurable caching reduces server load for frequently accessed files |
| 176 | +- **Monitoring**: Admin endpoint provides configuration visibility |
| 177 | +- **Private API Focused**: Defaults appropriate for API gateway deployment |
| 178 | + |
| 179 | +### ❌ Trade-offs |
| 180 | + |
| 181 | +- **Information Disclosure**: Well-known URIs are public and may reveal service information |
| 182 | +- **Cache Headers**: Public cache headers may not be appropriate for all deployments |
| 183 | +- **Configuration Complexity**: Additional environment variables to manage |
| 184 | +- **Static Content**: Well-known files are static and can't include dynamic information |
| 185 | + |
| 186 | +### 🔄 Maintenance |
| 187 | + |
| 188 | +- **security.txt Updates**: Requires periodic updates to contact information and expiration |
| 189 | +- **RFC Compliance**: Monitor RFC updates for security.txt format changes |
| 190 | +- **Custom File Management**: Organizations need to maintain custom well-known content |
| 191 | +- **Cache Tuning**: May need cache duration adjustments based on usage patterns |
| 192 | + |
| 193 | +## Configuration Examples |
| 194 | + |
| 195 | +### Basic Private API (Default) |
| 196 | +```bash |
| 197 | +WELL_KNOWN_ENABLED=true |
| 198 | +# robots.txt blocks all crawlers (default) |
| 199 | +# security.txt disabled (default) |
| 200 | +``` |
| 201 | + |
| 202 | +### Public API with Security Contact |
| 203 | +```bash |
| 204 | +WELL_KNOWN_ENABLED=true |
| 205 | +WELL_KNOWN_SECURITY_TXT= "Contact: mailto:[email protected]\nContact: https://example.com/security\nPreferred-Languages: en" |
| 206 | +WELL_KNOWN_ROBOTS_TXT="User-agent: *\nAllow: /health\nAllow: /docs\nDisallow: /" |
| 207 | +``` |
| 208 | + |
| 209 | +### Custom Policies |
| 210 | +```bash |
| 211 | +WELL_KNOWN_CUSTOM_FILES={"ai.txt": "AI Usage: Tool orchestration only", "dnt-policy.txt": "We honor Do Not Track headers"} |
| 212 | +``` |
| 213 | + |
| 214 | +## Alternatives Considered |
| 215 | + |
| 216 | +| Alternative | Why Not Chosen | |
| 217 | +|------------|----------------| |
| 218 | +| **Static file serving** | No environment-based configuration, harder to manage | |
| 219 | +| **Database-stored content** | Overly complex for static content, harder to configure | |
| 220 | +| **Middleware-based handler** | Less organized than router-based approach | |
| 221 | +| **Always-enabled endpoints** | Security risk, should be explicitly enabled | |
| 222 | +| **No security.txt validation** | Would allow non-compliant security.txt files | |
| 223 | +| **Wildcard well-known handler** | Security risk, explicit file support is safer | |
| 224 | + |
| 225 | +## Testing Strategy |
| 226 | + |
| 227 | +Implemented comprehensive test coverage: |
| 228 | +- **Default robots.txt**: Validates security-first defaults |
| 229 | +- **security.txt validation**: Tests RFC 9116 compliance and auto-enhancement |
| 230 | +- **Custom files**: Verifies JSON configuration parsing and serving |
| 231 | +- **404 handling**: Tests unknown files and helpful error messages |
| 232 | +- **Path normalization**: Ensures path traversal protection |
| 233 | +- **Registry functionality**: Validates well-known URI metadata |
| 234 | + |
| 235 | +## Future Enhancements |
| 236 | + |
| 237 | +Potential improvements for future iterations: |
| 238 | +- **Dynamic content**: Template variables (e.g., `{{DOMAIN}}`, `{{CONTACT_EMAIL}}`) |
| 239 | +- **File upload API**: Admin interface for uploading well-known files |
| 240 | +- **GPG signing**: Digital signature support for security.txt |
| 241 | +- **Rate limiting**: Specific limits for well-known endpoints |
| 242 | +- **Internationalization**: Multi-language support for policy files |
| 243 | +- **A/B testing**: Different content based on user agent or other criteria |
| 244 | + |
| 245 | +## Security Impact |
| 246 | + |
| 247 | +### Positive Security Impact |
| 248 | +- **Security contact**: Enables responsible disclosure by security researchers |
| 249 | +- **Crawler control**: Prevents unwanted indexing of private API endpoints |
| 250 | +- **Standards compliance**: Follows established web security practices |
| 251 | +- **Information control**: Explicit control over what information is disclosed |
| 252 | + |
| 253 | +### Security Considerations |
| 254 | +- **Information disclosure**: Well-known URIs are intentionally public |
| 255 | +- **Content validation**: Prevents serving malicious content through validation |
| 256 | +- **Cache control**: Public caching may not be appropriate for all environments |
| 257 | +- **Admin endpoint**: Configuration status requires authentication |
| 258 | + |
| 259 | +## Status |
| 260 | + |
| 261 | +This well-known URI handler implementation is **accepted and implemented** as of version 0.7.0, providing standards-compliant web service discovery while maintaining security-first defaults appropriate for private API gateway deployments. |
0 commit comments