Document rate limiting and multi-pod behavior

tadasant · claude · tadasant · commit e16cf8d5a9b3 · 2025-12-09T08:10:24.000-08:00
- Add rate limit section to official-registry-api.md documenting the 429 response format, limits, and client guidance - Add note in .env.example about per-pod rate limit behavior - Add code comment in server.go explaining multi-replica approximation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
diff --git a/.env.example b/.env.example
@@ -40,6 +40,8 @@ MCP_REGISTRY_OIDC_EDIT_PERMISSIONS=*
 MCP_REGISTRY_OIDC_PUBLISH_PERMISSIONS=*
 
 # Rate limiting configuration
+# Note: Rate limits are enforced per-pod, so in multi-replica deployments the
+# effective limits are approximate (e.g., 2 replicas = up to 2x the configured rate).
 # Enable or disable rate limiting (default: true)
 MCP_REGISTRY_RATE_LIMIT_ENABLED=true
 # Maximum requests per minute per IP address (default: 60)
diff --git a/docs/reference/api/official-registry-api.md b/docs/reference/api/official-registry-api.md
@@ -14,6 +14,28 @@ This API is based on the [generic registry API](./generic-registry-api.md) with
 - **[Live API Docs](https://registry.modelcontextprotocol.io/docs)** - Stoplight elements with try-it-now functionality
 - **[OpenAPI Spec](https://registry.modelcontextprotocol.io/openapi.yaml)** - Complete machine-readable specification
 
+## Rate Limiting
+
+The official registry enforces rate limits to protect against abuse:
+
+- **60 requests per minute** per IP address
+- **1,000 requests per hour** per IP address
+
+When rate limited, the API returns HTTP `429 Too Many Requests` with a `Retry-After: 60` header. The response body follows the [RFC 7807](https://tools.ietf.org/html/rfc7807) problem details format:
+
+```json
+{
+  "title": "Too Many Requests",
+  "status": 429,
+  "detail": "Rate limit exceeded. Please reduce request frequency and retry after some time."
+}
+```
+
+**Notes:**
+- Rate limits are approximate due to the multi-replica deployment architecture
+- The `/health`, `/ping`, and `/metrics` endpoints are not rate limited
+- Clients should implement exponential backoff when receiving 429 responses
+
 ## Extensions
 
 The official registry implements the [Generic Registry API](./generic-registry-api.md) with the following specific configurations and extensions:
diff --git a/internal/api/server.go b/internal/api/server.go
@@ -72,7 +72,9 @@ func NewServer(cfg *config.Config, registryService service.RegistryService, metr
 	// Order: TrailingSlash -> RateLimit -> CORS -> Mux
 	handler := corsHandler.Handler(mux)
 
-	// Initialize rate limiter if enabled
+	// Initialize rate limiter if enabled.
+	// Note: Rate limits are enforced per-pod, so in multi-replica deployments the
+	// effective limits are approximate (e.g., 2 replicas = up to 2x the configured rate).
 	var rateLimiter *ratelimit.RateLimiter
 	if cfg.RateLimitEnabled {
 		rateLimitConfig := ratelimit.Config{