A production-ready rate limiting service implementing the Token Bucket algorithm with Redis-backed distributed rate limiting and in-memory fallback.
Rate limiting is essential for:
- API Protection: Prevent abuse and ensure fair resource usage
- Cost Control: Limit expensive operations (database queries, external API calls)
- Stability: Protect backend systems from traffic spikes
- Compliance: Enforce usage quotas per user or API key
This service provides a distributed, scalable rate limiting solution that maintains availability even during Redis outages.
┌─────────────┐
│ Client │
└──────┬──────┘
│ HTTP Request
▼
┌─────────────────────┐
│ RateLimitFilter │ ← OncePerRequestFilter (Spring)
│ - Extract User ID │
│ - Build Redis Key │
└──────┬──────────────┘
│
▼
┌─────────────────────┐
│ RedisRateLimiter │
│ - Execute Lua Script│
│ - Fallback Logic │
└──────┬──────────────┘
│
├──► Redis (Primary)
│ └── Lua Script (Atomic)
│
└──► In-Memory (Fallback)
└── TokenBucket per User
- RateLimitFilter: Servlet filter intercepting all HTTP requests
- RedisRateLimiter: Core rate limiting logic with Redis + fallback
- TokenBucket: In-memory token bucket implementation
- Lua Script: Atomic Redis operation for distributed rate limiting
Token Bucket is a rate limiting algorithm that allows bursts up to a maximum capacity while maintaining a steady refill rate.
- Bucket Capacity: Maximum number of tokens (e.g., 100)
- Refill Rate: Tokens added per second (e.g., 10 tokens/sec)
- Request Processing:
- Each request consumes 1 token
- If tokens available → allow request, decrement token
- If no tokens → reject request (HTTP 429)
elapsedTime = currentTime - lastRefillTime
tokensToAdd = (elapsedTime / 1000) * refillRatePerSecond
currentTokens = min(capacity, currentTokens + tokensToAdd)
Example:
- Capacity: 100 tokens
- Refill Rate: 10 tokens/second
- After 1 second of no requests: +10 tokens (capped at 100)
- After 2 seconds: +20 tokens (capped at 100)
- Burst Handling: Allows short bursts up to capacity
- Smooth Rate: Maintains average rate over time
- Predictable: Easy to reason about and configure
Rate limiting requires atomic operations:
- Read current token count
- Calculate refill
- Check availability
- Decrement token
- Update timestamp
Without atomicity, concurrent requests could:
- Read the same token count
- Both be allowed when only one should be
- Exceed the rate limit
The Lua script (token_bucket.lua) executes atomically in Redis:
-- All operations happen atomically
local hashData = redis.call('HGETALL', key)
-- Calculate refill
-- Check tokens
-- Decrement if allowed
-- Update hash
return 1 or 0- Atomicity: Single Redis command ensures consistency
- Performance: Single round-trip to Redis
- Distributed: Works across multiple application instances
- Reliability: Redis handles script execution atomically
When Redis is unavailable, the service falls back to in-memory TokenBucket instances:
try {
return redisTemplate.execute(script, ...);
} catch (Exception e) {
// Fallback to in-memory TokenBucket
TokenBucket bucket = inMemoryBuckets.computeIfAbsent(
userId, k -> new TokenBucket(capacity, refillRate)
);
return bucket.allowRequest();
}Advantages:
- ✅ Application remains available during Redis outages
- ✅ No request failures due to Redis downtime
- ✅ Automatic failover without configuration
Limitations:
⚠️ Rate limits are per-instance, not shared across instances⚠️ With N instances, effective rate limit = N × configured limit⚠️ In-memory buckets lost on application restart
For production deployments:
- Monitor Redis health and alert on fallback usage
- Consider circuit breaker pattern for Redis failures
- Use Redis Sentinel/Cluster for high availability
- Implement distributed coordination (e.g., ZooKeeper) if strict limits required
-
Unit Tests (
TokenBucketTest):- Capacity enforcement
- Token exhaustion
- Refill rate accuracy
- Thread safety
-
Integration Tests (
RedisRateLimiterTest):- Redis success scenarios
- Redis failure fallback
- Per-user bucket isolation
- Key extraction logic
-
Concurrency Tests (
ConcurrencyTest):- 50-100 parallel requests
- Capacity enforcement under load
- Multiple users concurrently
- Refill during concurrent access
mvn test- Concurrency: 100 parallel threads
- Capacity: 50 tokens
- Verification: Exact capacity enforcement, no race conditions
- Per-Instance Fallback: In-memory buckets not shared across instances
- No Persistence: In-memory state lost on restart
- Fixed Configuration: Rate limits configured globally, not per-endpoint
- No Metrics: No built-in monitoring/metrics export
| Decision | Rationale | Alternative |
|---|---|---|
| Token Bucket vs Leaky Bucket | Allows bursts, more intuitive | Leaky Bucket (smoother but no bursts) |
| Redis Lua vs Multiple Commands | Atomicity, performance | Multiple Redis commands (race conditions) |
| In-Memory Fallback vs Fail-Open | Availability over strict limits | Fail-closed (strict but unavailable) |
| Filter vs Interceptor | Earlier in request lifecycle | HandlerInterceptor (later, less control) |
- Per-endpoint rate limits
- Sliding window algorithm option
- Metrics export (Prometheus/Micrometer)
- Distributed coordination for strict limits
- Rate limit headers in responses (X-RateLimit-*)
- Java 17+
- Maven 3.6+
- Redis 6.0+ (optional, falls back to in-memory)
- Clone and Build:
cd rate-limiter-service
mvn clean install- Start Redis (optional):
docker run -d -p 6379:6379 redis:7-alpine- Configure (optional):
Edit
src/main/resources/application.yml:
rate-limit:
capacity: 100
refill-rate-per-second: 10.0- Run Application:
mvn spring-boot:runThe service starts on http://localhost:8080
export REDIS_HOST=localhost
export REDIS_PORT=6379
export REDIS_PASSWORD= # Optionalcurl -H "X-User-Id: user123" http://localhost:8080/api/endpointResponse: 200 OK
After exceeding the limit:
curl -H "X-User-Id: user123" http://localhost:8080/api/endpointResponse: 429 Too Many Requests
{
"message": "Rate limit exceeded"
}The service identifies users by:
X-User-Idheader (preferred)X-Forwarded-Forheader (if behind proxy)X-Real-IPheader (if behind proxy)- Remote IP address (fallback)
# Send 101 requests (assuming capacity=100)
for i in {1..101}; do
curl -H "X-User-Id: test-user" http://localhost:8080/api/endpoint
doneExpected: First 100 succeed, 101st returns 429.
A: Token Bucket allows bursts up to capacity while maintaining average rate. Sliding Window is smoother but doesn't handle bursts well. For APIs with variable traffic, Token Bucket provides better user experience while still enforcing limits.
A: We use Redis Lua scripts that execute atomically on the Redis server. The entire operation (read, calculate, update) happens in a single atomic command, preventing race conditions across multiple application instances.
A: The service gracefully degrades to in-memory TokenBucket instances per user. This ensures availability but with a tradeoff: rate limits become per-instance rather than shared. Each instance maintains its own buckets, so with N instances, the effective limit is N × configured limit.
A: Options:
- Centralized Redis Cluster: Single Redis cluster shared across regions (latency tradeoff)
- Regional Redis + Coordination: Each region has Redis, coordinate via message queue
- Distributed Consensus: Use ZooKeeper/etcd for strict global limits
- Per-Region Limits: Accept regional limits, sum for global (simpler)
A: Use ExecutorService with 50-100 threads, CountDownLatch for synchronization, and AtomicInteger for counting. Verify that exactly capacity requests are allowed, regardless of thread scheduling.
A: O(1) for both operations:
allowRequest(): Constant time (simple arithmetic, hash lookup)- Token refill: O(1) (single calculation)
Storage: O(U) where U = number of unique users.
A:
- Extract endpoint from request path
- Build composite key:
rate-limit:{userId}:{endpoint} - Configure limits per endpoint in config
- Use same TokenBucket logic with endpoint-specific capacity/rate
A:
rate_limit_allowed_total{user_id}: Counter of allowed requestsrate_limit_denied_total{user_id}: Counter of denied requestsrate_limit_fallback_active: Gauge (1 if using fallback, 0 if Redis)rate_limit_redis_latency: Histogram of Redis operation time
A: Options:
- TTL-based cleanup: Remove buckets unused for X minutes
- LRU cache: Use
CaffeineorGuava Cachewith size/expiry limits - Periodic cleanup: Background thread removes stale entries
- Bounded map: Limit total number of buckets
Current implementation: No cleanup (acceptable for short Redis outages).
A: Filter runs earlier in the request lifecycle, before Spring MVC processing. This:
- Reduces overhead (reject before controller execution)
- Works for all endpoints automatically
- Can't be bypassed by controllers
- Better performance for high-volume rate limiting
This project is provided as-is for educational and interview purposes.