Skip to content

Commit a30aa41

Browse files
authored
Well known (#770)
Signed-off-by: Mihai Criveti <[email protected]>
1 parent a5d99ca commit a30aa41

15 files changed

+1129
-470
lines changed

.env.example

Lines changed: 79 additions & 373 deletions
Large diffs are not rendered by default.
Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
# ADR-0015: Configurable Well-Known URI Handler
2+
3+
- *Status:* Accepted
4+
- *Date:* 2025-08-17
5+
- *Deciders:* Core Engineering Team
6+
- *Issues:* [#540](https://github.com/IBM/mcp-context-forge/issues/540)
7+
- *Related:* Security infrastructure for standardized web discovery
8+
9+
## Context
10+
11+
The MCP Gateway needed to support standardized well-known URIs as defined by RFC 8615 to enable proper web service discovery, security contact information, and crawler management. Well-known URIs are standardized endpoints that web services expose for automated discovery and security contact purposes.
12+
13+
The implementation needed to address:
14+
- **robots.txt** for search engine crawler management (typically private API = disable crawling)
15+
- **security.txt** for security contact information per RFC 9116
16+
- **Custom well-known files** for additional service policies (AI usage, DNT policy, etc.)
17+
- **Security-first defaults** appropriate for private API gateway deployment
18+
- **Configuration flexibility** for different deployment scenarios
19+
- **Admin monitoring** of well-known configuration status
20+
21+
### Requirements
22+
23+
- Support standard well-known URIs (robots.txt, security.txt)
24+
- Allow custom well-known files via configuration
25+
- Default to private API security posture (no crawling)
26+
- RFC 9116 compliant security.txt with automatic validation
27+
- Configurable cache headers for performance
28+
- Admin endpoint for configuration monitoring
29+
- Environment-based configuration via standard patterns
30+
31+
## Decision
32+
33+
We implemented a flexible `/.well-known/*` endpoint handler with the following design:
34+
35+
### 1. Router-Based Implementation
36+
37+
Created `mcpgateway/routers/well_known.py` with a dedicated FastAPI router:
38+
39+
```python
40+
@router.get("/.well-known/{filename:path}", include_in_schema=False)
41+
async def get_well_known_file(filename: str, response: Response, request: Request):
42+
"""Serve well-known URI files with configurable content and security defaults."""
43+
```
44+
45+
**Design decisions:**
46+
- **Router isolation**: Separate router for clean organization and testing
47+
- **Dynamic routing**: Single endpoint handles all well-known URIs
48+
- **Security-first**: Disabled by default, explicit enable required
49+
- **Schema exclusion**: Not included in OpenAPI docs (reduces attack surface)
50+
51+
### 2. Configuration-Driven Content
52+
53+
Extended `mcpgateway/config.py` with well-known URI settings:
54+
55+
```python
56+
# Well-Known URI Configuration
57+
well_known_enabled: bool = True
58+
well_known_robots_txt: str = """User-agent: *
59+
Disallow: /
60+
61+
# MCP Gateway is a private API gateway
62+
# Public crawling is disabled by default"""
63+
64+
well_known_security_txt: str = ""
65+
well_known_security_txt_enabled: bool = False
66+
well_known_custom_files: str = "{}" # JSON format
67+
well_known_cache_max_age: int = 3600 # 1 hour
68+
```
69+
70+
**Design decisions:**
71+
- **Private API defaults**: robots.txt blocks all crawlers by default
72+
- **Explicit security.txt**: Only enabled when content is provided
73+
- **JSON custom files**: Flexible format for additional well-known files
74+
- **Configurable caching**: Performance optimization with sensible defaults
75+
76+
### 3. RFC 9116 Security.txt Compliance
77+
78+
Implemented automatic security.txt validation and enhancement:
79+
80+
```python
81+
def validate_security_txt(content: str) -> Optional[str]:
82+
"""Validate security.txt format and add required headers."""
83+
# Add Expires field if missing (6 months from now)
84+
# Add header comments for clarity
85+
# Preserve existing valid content
86+
```
87+
88+
**Design decisions:**
89+
- **Auto-expires**: Adds Expires header if missing (RFC requirement)
90+
- **Header comments**: Adds generation timestamp and description
91+
- **Validation**: Ensures RFC 9116 compliance
92+
- **Preservation**: Maintains existing valid fields
93+
94+
### 4. Well-Known Registry
95+
96+
Implemented a registry for known well-known URIs with metadata:
97+
98+
```python
99+
WELL_KNOWN_REGISTRY = {
100+
"robots.txt": {
101+
"content_type": "text/plain",
102+
"description": "Robot exclusion standard",
103+
"rfc": "RFC 9309"
104+
},
105+
"security.txt": {
106+
"content_type": "text/plain",
107+
"description": "Security contact information",
108+
"rfc": "RFC 9116"
109+
},
110+
# ... additional standard URIs
111+
}
112+
```
113+
114+
**Design decisions:**
115+
- **Helpful errors**: Provides descriptive 404 messages for known but unconfigured files
116+
- **Content-Type mapping**: Ensures correct MIME types
117+
- **Documentation**: Links to relevant RFCs and standards
118+
- **Extensibility**: Easy to add new standard well-known URIs
119+
120+
### 5. Admin Monitoring
121+
122+
Added `/admin/well-known` endpoint for configuration visibility:
123+
124+
```python
125+
@router.get("/admin/well-known", response_model=dict)
126+
async def get_well_known_status(user: str = Depends(require_auth)):
127+
"""Returns configuration status and available well-known files."""
128+
```
129+
130+
**Design decisions:**
131+
- **Authentication required**: Admin endpoint requires JWT authentication
132+
- **Configuration visibility**: Shows enabled files and cache settings
133+
- **Supported files list**: Displays all known well-known URI types
134+
- **Status monitoring**: Helps administrators verify configuration
135+
136+
## Implementation Architecture
137+
138+
### File Structure
139+
```
140+
mcpgateway/
141+
├── config.py # Well-known configuration settings
142+
├── routers/
143+
│ └── well_known.py # /.well-known/* endpoint handler
144+
└── main.py # Router integration
145+
146+
tests/unit/mcpgateway/
147+
└── test_well_known.py # Comprehensive test coverage
148+
149+
.env.example # Configuration documentation
150+
```
151+
152+
### Request Flow
153+
1. **Request**: `GET /.well-known/robots.txt`
154+
2. **Authentication**: No auth required for well-known URIs (public by design)
155+
3. **Validation**: Check if well-known endpoints are enabled
156+
4. **Routing**: Match filename to configured content or registry
157+
5. **Headers**: Add cache control and specific headers (X-Robots-Tag for robots.txt)
158+
6. **Response**: Return PlainTextResponse with appropriate headers
159+
160+
### Security Considerations
161+
- **No authentication**: Well-known URIs are public by design per RFC 8615
162+
- **Content validation**: security.txt content validated against RFC 9116
163+
- **Path traversal protection**: Filename normalization prevents directory traversal
164+
- **Cache headers**: Appropriate cache settings reduce server load
165+
- **Information disclosure**: Default robots.txt reveals minimal information
166+
167+
## Consequences
168+
169+
### ✅ Benefits
170+
171+
- **Standards Compliance**: Implements RFC 8615 (well-known URIs) and RFC 9116 (security.txt)
172+
- **Security Contact**: Enables security researchers to find contact information
173+
- **Crawler Management**: Proper robots.txt prevents unwanted search engine indexing
174+
- **Flexibility**: Custom well-known files support organization-specific policies
175+
- **Performance**: Configurable caching reduces server load for frequently accessed files
176+
- **Monitoring**: Admin endpoint provides configuration visibility
177+
- **Private API Focused**: Defaults appropriate for API gateway deployment
178+
179+
### ❌ Trade-offs
180+
181+
- **Information Disclosure**: Well-known URIs are public and may reveal service information
182+
- **Cache Headers**: Public cache headers may not be appropriate for all deployments
183+
- **Configuration Complexity**: Additional environment variables to manage
184+
- **Static Content**: Well-known files are static and can't include dynamic information
185+
186+
### 🔄 Maintenance
187+
188+
- **security.txt Updates**: Requires periodic updates to contact information and expiration
189+
- **RFC Compliance**: Monitor RFC updates for security.txt format changes
190+
- **Custom File Management**: Organizations need to maintain custom well-known content
191+
- **Cache Tuning**: May need cache duration adjustments based on usage patterns
192+
193+
## Configuration Examples
194+
195+
### Basic Private API (Default)
196+
```bash
197+
WELL_KNOWN_ENABLED=true
198+
# robots.txt blocks all crawlers (default)
199+
# security.txt disabled (default)
200+
```
201+
202+
### Public API with Security Contact
203+
```bash
204+
WELL_KNOWN_ENABLED=true
205+
WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]\nContact: https://example.com/security\nPreferred-Languages: en"
206+
WELL_KNOWN_ROBOTS_TXT="User-agent: *\nAllow: /health\nAllow: /docs\nDisallow: /"
207+
```
208+
209+
### Custom Policies
210+
```bash
211+
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "AI Usage: Tool orchestration only", "dnt-policy.txt": "We honor Do Not Track headers"}
212+
```
213+
214+
## Alternatives Considered
215+
216+
| Alternative | Why Not Chosen |
217+
|------------|----------------|
218+
| **Static file serving** | No environment-based configuration, harder to manage |
219+
| **Database-stored content** | Overly complex for static content, harder to configure |
220+
| **Middleware-based handler** | Less organized than router-based approach |
221+
| **Always-enabled endpoints** | Security risk, should be explicitly enabled |
222+
| **No security.txt validation** | Would allow non-compliant security.txt files |
223+
| **Wildcard well-known handler** | Security risk, explicit file support is safer |
224+
225+
## Testing Strategy
226+
227+
Implemented comprehensive test coverage:
228+
- **Default robots.txt**: Validates security-first defaults
229+
- **security.txt validation**: Tests RFC 9116 compliance and auto-enhancement
230+
- **Custom files**: Verifies JSON configuration parsing and serving
231+
- **404 handling**: Tests unknown files and helpful error messages
232+
- **Path normalization**: Ensures path traversal protection
233+
- **Registry functionality**: Validates well-known URI metadata
234+
235+
## Future Enhancements
236+
237+
Potential improvements for future iterations:
238+
- **Dynamic content**: Template variables (e.g., `{{DOMAIN}}`, `{{CONTACT_EMAIL}}`)
239+
- **File upload API**: Admin interface for uploading well-known files
240+
- **GPG signing**: Digital signature support for security.txt
241+
- **Rate limiting**: Specific limits for well-known endpoints
242+
- **Internationalization**: Multi-language support for policy files
243+
- **A/B testing**: Different content based on user agent or other criteria
244+
245+
## Security Impact
246+
247+
### Positive Security Impact
248+
- **Security contact**: Enables responsible disclosure by security researchers
249+
- **Crawler control**: Prevents unwanted indexing of private API endpoints
250+
- **Standards compliance**: Follows established web security practices
251+
- **Information control**: Explicit control over what information is disclosed
252+
253+
### Security Considerations
254+
- **Information disclosure**: Well-known URIs are intentionally public
255+
- **Content validation**: Prevents serving malicious content through validation
256+
- **Cache control**: Public caching may not be appropriate for all environments
257+
- **Admin endpoint**: Configuration status requires authentication
258+
259+
## Status
260+
261+
This well-known URI handler implementation is **accepted and implemented** as of version 0.7.0, providing standards-compliant web service discovery while maintaining security-first defaults appropriate for private API gateway deployments.

docs/docs/architecture/adr/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,6 @@ This page tracks all significant design decisions made for the MCP Gateway proje
1515
| 0009 | Built-in Health Checks & Self-Monitoring | Accepted | Operations | 2025-02-21 |
1616
| 0010 | Observability via Prometheus, Structured Logs | Accepted | Observability | 2025-02-21 |
1717
| 0014 | Security Headers & Environment-Aware CORS Middleware | Accepted | Security | 2025-08-17 |
18+
| 0015 | Configurable Well-Known URI Handler | Accepted | Security | 2025-08-17 |
1819

1920
> ✳️ Add new decisions chronologically and link to them from this table.

docs/docs/manage/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Whether you're self-hosting, running in the cloud, or deploying to Kubernetes, t
1515
| [Export/Import Tutorial](export-import-tutorial.md) | Step-by-step tutorial for getting started with export/import |
1616
| [Export/Import Reference](export-import-reference.md) | Quick reference guide for export/import commands and APIs |
1717
| [Bulk Import](bulk-import.md) | Import multiple tools at once for migrations and team onboarding |
18+
| [Well-Known URIs](well-known-uris.md) | Configure robots.txt, security.txt, and custom well-known files |
1819
| [Logging](logging.md) | Configure structured logging, log destinations, and log rotation |
1920

2021
---

docs/docs/manage/securing.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,30 @@ MCP Gateway should be integrated with:
121121
- [ ] SIEM for security monitoring
122122
- [ ] Load balancer with TLS termination
123123

124-
### 10. Downstream Application Security
124+
### 10. Well-Known URI Security
125+
126+
Configure well-known URIs appropriately for your deployment:
127+
128+
```bash
129+
# For private APIs (default) - blocks all crawlers
130+
WELL_KNOWN_ENABLED=true
131+
WELL_KNOWN_ROBOTS_TXT="User-agent: *\nDisallow: /"
132+
133+
# For public APIs - allow health checks, block sensitive endpoints
134+
# WELL_KNOWN_ROBOTS_TXT="User-agent: *\nAllow: /health\nAllow: /docs\nDisallow: /admin\nDisallow: /tools"
135+
136+
# Security contact information (RFC 9116)
137+
WELL_KNOWN_SECURITY_TXT="Contact: mailto:[email protected]\nExpires: 2025-12-31T23:59:59Z\nPreferred-Languages: en"
138+
```
139+
140+
Security considerations:
141+
- [ ] Configure security.txt with current contact information
142+
- [ ] Review robots.txt to prevent unauthorized crawler access
143+
- [ ] Monitor well-known endpoint access in logs
144+
- [ ] Update security.txt Expires field before expiration
145+
- [ ] Consider custom well-known files only if necessary
146+
147+
### 11. Downstream Application Security
125148

126149
Applications consuming MCP Gateway data must:
127150

0 commit comments

Comments
 (0)