Skip to content

Conversation

@tadasant
Copy link
Member

@tadasant tadasant commented Dec 9, 2025

Written by Claude Code, reviewed by me. When we get this in main I'll test it out against staging to make sure it's acting as intended.

Summary

  • Implements IP-based rate limiting middleware to address service disruptions caused by excessive requests (Implement rate limits to address mis-use causing downtime #826)
  • Enforces dual rate limits: 60 requests/minute and 1000 requests/hour per IP address
  • Fully configurable via environment variables (MCP_REGISTRY_RATE_LIMIT_*)
  • Excludes health/ping/metrics endpoints from rate limiting
  • Returns RFC 7807 problem+json responses with Retry-After header when rate limited

Configuration Options

Variable Default Description
MCP_REGISTRY_RATE_LIMIT_ENABLED true Enable/disable rate limiting
MCP_REGISTRY_RATE_LIMIT_REQUESTS_PER_MINUTE 60 Max requests per minute per IP
MCP_REGISTRY_RATE_LIMIT_REQUESTS_PER_HOUR 1000 Max requests per hour per IP

Implementation Details

  • Uses token bucket algorithm via golang.org/x/time/rate (already a dependency)
  • Properly handles X-Forwarded-For and X-Real-IP headers (since registry is deployed behind NGINX ingress with use-forwarded-headers=true)
  • All IPs are validated using net.ParseIP() to handle malformed input
  • Memory protection via MaxVisitors limit (100k) with LRU eviction
  • Background cleanup of stale visitor entries every 10 minutes

Test plan

  • Unit tests cover rate limiting logic, middleware behavior, and IP extraction
  • Tests verify both minute and hourly limits work correctly
  • Tests verify skip paths are not rate limited
  • Tests verify X-Forwarded-For and X-Real-IP header handling
  • Tests verify invalid IP handling (empty, malformed, IPv6)
  • Tests verify memory limit enforcement and eviction
  • Tests verify concurrent access safety
  • golangci-lint passes with no issues
  • All tests pass locally
  • CI pipeline passes

Known Limitations

  • Multi-pod deployment: In-memory rate limiting means each pod has independent limits. For strict per-IP limits across the cluster, consider Redis-based rate limiting in a future iteration.

Closes #826

🤖 Generated with Claude Code

tadasant and others added 5 commits December 9, 2025 07:20
Implements rate limiting middleware to address service disruptions caused by
excessive requests. The rate limiter enforces per-IP limits of 60 requests
per minute and 1000 requests per hour (configurable via environment variables).

Features:
- Dual rate limiting (per-minute and per-hour) using token bucket algorithm
- Proper handling of X-Forwarded-For and X-Real-IP headers for proxied requests
- Skip paths for health, ping, and metrics endpoints
- Background cleanup of stale visitor entries
- Configurable via MCP_REGISTRY_RATE_LIMIT_* environment variables
- Returns RFC 7807 problem+json responses with Retry-After header

Closes #826

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add TrustProxy config option (defaults to false for security)
  - When false, only RemoteAddr is used (prevents IP spoofing)
  - When true, X-Forwarded-For and X-Real-IP headers are trusted
- Add IP validation using net.ParseIP() to handle invalid/malformed IPs
- Add MaxVisitors config to prevent memory exhaustion attacks
- Implement LRU-style eviction when MaxVisitors limit is reached
- Optimize lock granularity with read locks for existing visitors
- Add comprehensive tests for:
  - TrustProxy enabled/disabled behavior
  - IP spoofing prevention
  - Invalid IP handling
  - IPv6 support
  - Memory limit enforcement
  - Concurrent access

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Since the registry is always deployed behind NGINX ingress with
use-forwarded-headers=true, there's no need for a TrustProxy config option.
The NGINX ingress is trusted infrastructure that correctly sets
X-Forwarded-For headers.

Removed:
- TrustProxy config option and environment variable
- Tests for TrustProxy enabled/disabled behavior

The implementation now always checks X-Forwarded-For/X-Real-IP headers
first, then falls back to RemoteAddr. This matches the deployment setup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add mcp_registry.http.rate_limited counter to track when requests are
blocked by rate limiting. This enables monitoring rate limit activity
over time via Prometheus.

- Add RateLimitedRequests counter to telemetry.Metrics
- Add OnRateLimited callback to rate limiter config
- Wire up metrics in server initialization
- Add test for callback behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add rate limit section to official-registry-api.md documenting the
  429 response format, limits, and client guidance
- Add note in .env.example about per-pod rate limit behavior
- Add code comment in server.go explaining multi-replica approximation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@tadasant tadasant marked this pull request as ready for review December 9, 2025 19:10
@tadasant tadasant requested a review from a team December 9, 2025 19:10
return v
}

// Enforce max visitors limit (memory protection)
Copy link
Contributor

@pree-dew pree-dew Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tadasant if there are request coming from unique ips then it will iterate over all exisiting entries to find 1 ip then remove while occupying lock. After 100K, every request will take good amount of time.

Please feel free to correct me if I am missing something here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also implement LRU based structure here. or may be remove some % of entries if the limit is hit instead of removing 1 on every new request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be more than happy if I can help in anyway here.

}

// Need to create new visitor - acquire write lock
rl.mu.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be very slow when throughput is high, as per the stats on grafana over per hour throughput is always above 15K and has reached max upto ~35K. Cleanup won't happen till the time an entry is not older than an hour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be shard based rate limiter will suit better if we are expecting per min throughput to increase.

Copy link
Member

@rdimitrov rdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would want to solve this in a different way and handle this in the infrastructure layer rather than inside the registry 👍

@Avish34
Copy link
Contributor

Avish34 commented Dec 11, 2025

+1 to the @rdimitrov 's suggestion. Implementing rate limiting in the registry might slow down the system when we hit scale. Given the fact that we are having NGNIX as ingress controller in our infra we might as well leverage that to do the rate limiting.
Reference: https://www.civo.com/learn/rate-limiting-applications-with-nginx-ingress

MCP_REGISTRY_OIDC_EDIT_PERMISSIONS=*
MCP_REGISTRY_OIDC_PUBLISH_PERMISSIONS=*

# Rate limiting configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not bother making this configurable, and instead just have this baked in. Keeps the number of knobs and dials low :)

@@ -0,0 +1,297 @@
// Package ratelimit provides IP-based rate limiting middleware for HTTP servers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is golang the right place to be doing this - feel like maybe should be done on the Kubernetes ingress?

# Enable or disable rate limiting (default: true)
MCP_REGISTRY_RATE_LIMIT_ENABLED=true
# Maximum requests per minute per IP address (default: 60)
MCP_REGISTRY_RATE_LIMIT_REQUESTS_PER_MINUTE=60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to push this a bit higher (without changing the per hour limit) to allow for slightly spiky/bursty traffic e.g. for people doing local analysis. E.g. maybe 200 per minute?

Copy link
Member

@domdomegg domdomegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems broadly sensible - plausibly we should do this at infra layer though e.g. nginx ingress

@tadasant
Copy link
Member Author

Closing in favor of #843

@tadasant tadasant closed this Dec 12, 2025
rdimitrov added a commit that referenced this pull request Dec 12, 2025
<!-- Provide a brief summary of your changes -->

## Motivation and Context
<!-- Why is this change needed? What problem does it solve? -->
The following is an alternative solution to #833

Goal is to deploy and promote it to prod, then monitor the alerts for a
week to see its effect and reassess. In case it's actual traffic that
we've started blocking we can also up the replicas. I thought it doesn't
make sense to up them now since it would be nice to isolate the result
of this change first.

## How Has This Been Tested?
<!-- Have you tested this in a real application? Which scenarios were
tested? -->

## Breaking Changes
<!-- Will users need to update their code or configurations? -->

## Types of changes
<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update

## Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply. -->
- [ ] I have read the [MCP
Documentation](https://modelcontextprotocol.io)
- [ ] My code follows the repository's style guidelines
- [ ] New and existing tests pass locally
- [ ] I have added appropriate error handling
- [ ] I have added or updated documentation as needed

## Additional context
<!-- Add any other context, implementation notes, or design decisions
-->

---------

Signed-off-by: Radoslav Dimitrov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement rate limits to address mis-use causing downtime

6 participants