Rate Limiter Service

A production-ready rate limiting service implementing the Token Bucket algorithm with Redis-backed distributed rate limiting and in-memory fallback.

Problem Statement

Rate limiting is essential for:

API Protection: Prevent abuse and ensure fair resource usage
Cost Control: Limit expensive operations (database queries, external API calls)
Stability: Protect backend systems from traffic spikes
Compliance: Enforce usage quotas per user or API key

This service provides a distributed, scalable rate limiting solution that maintains availability even during Redis outages.

High-Level Architecture

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ HTTP Request
       ▼
┌─────────────────────┐
│  RateLimitFilter    │  ← OncePerRequestFilter (Spring)
│  - Extract User ID  │
│  - Build Redis Key  │
└──────┬──────────────┘
       │
       ▼
┌─────────────────────┐
│  RedisRateLimiter    │
│  - Execute Lua Script│
│  - Fallback Logic   │
└──────┬──────────────┘
       │
       ├──► Redis (Primary)
       │    └── Lua Script (Atomic)
       │
       └──► In-Memory (Fallback)
            └── TokenBucket per User

Components

RateLimitFilter: Servlet filter intercepting all HTTP requests
RedisRateLimiter: Core rate limiting logic with Redis + fallback
TokenBucket: In-memory token bucket implementation
Lua Script: Atomic Redis operation for distributed rate limiting

Token Bucket Algorithm

Overview

Token Bucket is a rate limiting algorithm that allows bursts up to a maximum capacity while maintaining a steady refill rate.

How It Works

Bucket Capacity: Maximum number of tokens (e.g., 100)
Refill Rate: Tokens added per second (e.g., 10 tokens/sec)
Request Processing:
- Each request consumes 1 token
- If tokens available → allow request, decrement token
- If no tokens → reject request (HTTP 429)

Token Refill Logic

elapsedTime = currentTime - lastRefillTime
tokensToAdd = (elapsedTime / 1000) * refillRatePerSecond
currentTokens = min(capacity, currentTokens + tokensToAdd)

Example:

Capacity: 100 tokens
Refill Rate: 10 tokens/second
After 1 second of no requests: +10 tokens (capped at 100)
After 2 seconds: +20 tokens (capped at 100)

Advantages

Burst Handling: Allows short bursts up to capacity
Smooth Rate: Maintains average rate over time
Predictable: Easy to reason about and configure

Redis + Lua Atomicity

Why Lua Scripts?

Rate limiting requires atomic operations:

Read current token count
Calculate refill
Check availability
Decrement token
Update timestamp

Without atomicity, concurrent requests could:

Read the same token count
Both be allowed when only one should be
Exceed the rate limit

Implementation

The Lua script (token_bucket.lua) executes atomically in Redis:

-- All operations happen atomically
local hashData = redis.call('HGETALL', key)
-- Calculate refill
-- Check tokens
-- Decrement if allowed
-- Update hash
return 1 or 0

Benefits

Atomicity: Single Redis command ensures consistency
Performance: Single round-trip to Redis
Distributed: Works across multiple application instances
Reliability: Redis handles script execution atomically

Failure Handling Strategy

Graceful Degradation

When Redis is unavailable, the service falls back to in-memory TokenBucket instances:

try {
    return redisTemplate.execute(script, ...);
} catch (Exception e) {
    // Fallback to in-memory TokenBucket
    TokenBucket bucket = inMemoryBuckets.computeIfAbsent(
        userId, k -> new TokenBucket(capacity, refillRate)
    );
    return bucket.allowRequest();
}

Tradeoffs

Advantages:

✅ Application remains available during Redis outages
✅ No request failures due to Redis downtime
✅ Automatic failover without configuration

Limitations:

⚠️ Rate limits are per-instance, not shared across instances
⚠️ With N instances, effective rate limit = N × configured limit
⚠️ In-memory buckets lost on application restart

Production Considerations

For production deployments:

Monitor Redis health and alert on fallback usage
Consider circuit breaker pattern for Redis failures
Use Redis Sentinel/Cluster for high availability
Implement distributed coordination (e.g., ZooKeeper) if strict limits required

Testing Strategy

Test Coverage

Unit Tests (TokenBucketTest):
- Capacity enforcement
- Token exhaustion
- Refill rate accuracy
- Thread safety
Integration Tests (RedisRateLimiterTest):
- Redis success scenarios
- Redis failure fallback
- Per-user bucket isolation
- Key extraction logic
Concurrency Tests (ConcurrencyTest):
- 50-100 parallel requests
- Capacity enforcement under load
- Multiple users concurrently
- Refill during concurrent access

Running Tests

mvn test

Test Metrics

Concurrency: 100 parallel threads
Capacity: 50 tokens
Verification: Exact capacity enforcement, no race conditions

Tradeoffs & Limitations

Current Limitations

Per-Instance Fallback: In-memory buckets not shared across instances
No Persistence: In-memory state lost on restart
Fixed Configuration: Rate limits configured globally, not per-endpoint
No Metrics: No built-in monitoring/metrics export

Design Decisions

Decision	Rationale	Alternative
Token Bucket vs Leaky Bucket	Allows bursts, more intuitive	Leaky Bucket (smoother but no bursts)
Redis Lua vs Multiple Commands	Atomicity, performance	Multiple Redis commands (race conditions)
In-Memory Fallback vs Fail-Open	Availability over strict limits	Fail-closed (strict but unavailable)
Filter vs Interceptor	Earlier in request lifecycle	HandlerInterceptor (later, less control)

Future Enhancements

Per-endpoint rate limits
Sliding window algorithm option
Metrics export (Prometheus/Micrometer)
Distributed coordination for strict limits
Rate limit headers in responses (X-RateLimit-*)

How to Run Locally

Prerequisites

Java 17+
Maven 3.6+
Redis 6.0+ (optional, falls back to in-memory)

Setup

Clone and Build:

cd rate-limiter-service
mvn clean install

Start Redis (optional):

docker run -d -p 6379:6379 redis:7-alpine

Configure (optional): Edit src/main/resources/application.yml:

rate-limit:
  capacity: 100
  refill-rate-per-second: 10.0

Run Application:

mvn spring-boot:run

The service starts on http://localhost:8080

Environment Variables

export REDIS_HOST=localhost
export REDIS_PORT=6379
export REDIS_PASSWORD=  # Optional

Sample API Usage

Successful Request

curl -H "X-User-Id: user123" http://localhost:8080/api/endpoint

Response: 200 OK

Rate Limited Request

After exceeding the limit:

curl -H "X-User-Id: user123" http://localhost:8080/api/endpoint

Response: 429 Too Many Requests

{
  "message": "Rate limit exceeded"
}

User Identification

The service identifies users by:

X-User-Id header (preferred)
X-Forwarded-For header (if behind proxy)
X-Real-IP header (if behind proxy)
Remote IP address (fallback)

Testing Rate Limits

# Send 101 requests (assuming capacity=100)
for i in {1..101}; do
  curl -H "X-User-Id: test-user" http://localhost:8080/api/endpoint
done

Expected: First 100 succeed, 101st returns 429.

Design Decisions & Tradeoffs

Q1: Why Token Bucket over Sliding Window?

A: Token Bucket allows bursts up to capacity while maintaining average rate. Sliding Window is smoother but doesn't handle bursts well. For APIs with variable traffic, Token Bucket provides better user experience while still enforcing limits.

Q2: How do you ensure atomicity in distributed rate limiting?

A: We use Redis Lua scripts that execute atomically on the Redis server. The entire operation (read, calculate, update) happens in a single atomic command, preventing race conditions across multiple application instances.

Q3: What happens when Redis is down?

A: The service gracefully degrades to in-memory TokenBucket instances per user. This ensures availability but with a tradeoff: rate limits become per-instance rather than shared. Each instance maintains its own buckets, so with N instances, the effective limit is N × configured limit.

Q4: How would you handle rate limiting across multiple data centers?

A: Options:

Centralized Redis Cluster: Single Redis cluster shared across regions (latency tradeoff)
Regional Redis + Coordination: Each region has Redis, coordinate via message queue
Distributed Consensus: Use ZooKeeper/etcd for strict global limits
Per-Region Limits: Accept regional limits, sum for global (simpler)

Q5: How do you test concurrency in rate limiting?

A: Use ExecutorService with 50-100 threads, CountDownLatch for synchronization, and AtomicInteger for counting. Verify that exactly capacity requests are allowed, regardless of thread scheduling.

Q6: What's the time complexity of the Token Bucket algorithm?

A: O(1) for both operations:

allowRequest(): Constant time (simple arithmetic, hash lookup)
Token refill: O(1) (single calculation)

Storage: O(U) where U = number of unique users.

Q7: How would you implement per-endpoint rate limits?

A:

Extract endpoint from request path
Build composite key: rate-limit:{userId}:{endpoint}
Configure limits per endpoint in config
Use same TokenBucket logic with endpoint-specific capacity/rate

Q8: What metrics would you expose for monitoring?

A:

rate_limit_allowed_total{user_id}: Counter of allowed requests
rate_limit_denied_total{user_id}: Counter of denied requests
rate_limit_fallback_active: Gauge (1 if using fallback, 0 if Redis)
rate_limit_redis_latency: Histogram of Redis operation time

Q9: How do you prevent memory leaks in the in-memory fallback?