Rate Limiting

This document describes the Redis-backed rate limiting system for controlling API usage.

Overview

The rate limiting system enforces global usage limits using a Redis-backed sliding window algorithm:

Requests per minute
Requests per hour

Architecture

┌─────────────────┐      ┌────────────────────┐      ┌──────────────────┐
│  API Request    │────▶│RateLimitMiddleware │────▶│ RedisRateLimiter │
│                 │      │                    │      │  (Redis + Lua)   │
└─────────────────┘      └────────────────────┘      └──────────────────┘

Components

Component	File	Description
`RateLimitMiddleware`	`ratelimit/middleware.py`	FastAPI middleware for enforcement
`RedisRateLimiter`	`ratelimit/middleware.py`	Redis sliding window limiter using Lua scripts

Configuration

Environment Variables

# Redis backend (required)
RATE_LIMIT_REDIS_URL=redis://localhost:6379/0
RATE_LIMIT_REDIS_TIMEOUT_MS=200
RATE_LIMIT_KEY_PREFIX=lightspeed:ratelimit

# Global rate limits
RATE_LIMIT_REQUESTS_PER_MINUTE=60
RATE_LIMIT_REQUESTS_PER_HOUR=1000

Rate-Limited Paths

Only specific paths are rate-limited:

Path	Description
`/`	A2A JSON-RPC endpoint (supports both send and streaming)

Principal Dimensions

Rate limits are evaluated across multiple principal dimensions:

order_id (tenant/subscription boundary)
user_id (or client_id if user_id is unavailable)
IP fallback only when no authenticated principal is available

If both order_id and user_id are present, the request must pass both checks. If either dimension exceeds the configured limit, the request is rejected with 429.

Skipped Paths

These paths are never rate-limited:

/health, /healthz, /ready - Health checks
/metrics - Prometheus metrics
/.well-known/agent.json - Agent card
/docs, /openapi.json, /redoc - Documentation

Response Headers

When a request is rate-limited (429 response):

Header	Description
`Retry-After`	Seconds until the limit resets
`X-RateLimit-Limit`	The limit per minute
`X-RateLimit-Remaining`	Remaining requests

Example response:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded (per_minute)",
  "retry_after": 60
}

How It Works

Sliding Window Algorithm

The rate limiter uses an atomic Redis + Lua sliding window algorithm:

For each principal dimension (for example order_id and user_id), Redis keeps two sorted sets:
- a minute window key (:m)
- an hour window key (:h)
Before checking limits, old entries are removed from each set with ZREMRANGEBYSCORE so only in-window requests remain.
Redis counts current in-window requests with ZCARD and compares them to configured limits.
If any dimension is already at/over the limit, the script returns 429 metadata (including Retry-After) and does not record the new request.
If all dimensions are under limits, the script records the new request with ZADD and updates key expiry with PEXPIRE.

Request Flow

1. Request arrives
2. Middleware checks if path should be rate-limited
3. RedisRateLimiter executes an atomic Lua script in Redis
4. If within limits:
   - Record timestamp
   - Allow request
5. If exceeded:
   - Return 429 Too Many Requests
   - Include Retry-After header

Testing Rate Limiting

# Make 70 requests quickly (default limit is 60/min)
for i in {1..70}; do
  echo -n "Request $i: "
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST http://localhost:8000/ \
    -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"message/send","id":'$i',"params":{"message":{"role":"user","parts":[{"type":"text","text":"test"}]}}}'
done

You should see 429 responses after 60 requests.

Cloud Run

When the agent is deployed on Cloud Run, use your service URL and include a Bearer token (production typically requires authentication):

SERVICE_URL="https://your-service-xxxx-uc.a.run.app"  # Your Cloud Run URL
TOKEN="your-oauth-token"  # From DCR client_credentials or SSO

for i in {1..70}; do
  echo -n "Request $i: "
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST "$SERVICE_URL/" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $TOKEN" \
    -d '{"jsonrpc":"2.0","method":"message/send","id":'$i',"params":{"message":{"role":"user","parts":[{"type":"text","text":"test"}]}}}'
done

With authentication, rate limits apply per order_id and user_id (from the token) instead of per IP. Redis (Cloud Memorystore) is internal to the VPC and cannot be inspected with redis-cli from outside.

Rate Limiting vs Usage Tracking

The agent combines HTTP-level throttling, usage metering, and an optional per-run tool budget. They are separate layers:

Layer	What it limits	When it runs	Shared across replicas?
HTTP rate limiting	Incoming A2A POSTs per principal (minute/hour windows)	FastAPI middleware before the ADK runner	Yes, when all instances use the same Redis
Usage tracking (DB)	Requests, tokens, completed tool calls for billing/analytics	ADK plugin (`UsageTrackingPlugin`)	Yes, all instances write to the same database
Per-invocation tool budget	How many tools may start in one agent run	ADK `before_tool_callback`	No (today): in-memory per process; see Per-invocation tool budget and proposed shared counters

Comparison in plain terms: Redis rate limits stop a client from opening too many HTTP conversations. The tool budget stops a single conversation from hammering MCP with an unbounded tool–model loop. Metering records what actually ran for reporting, including tools that completed (blocked tools are not counted in after_tool_callback).

Notes

Rate limits are enforced across replicas as long as they share the same Redis instance.
The service verifies Redis connectivity at startup and fails fast when Redis is unavailable.
Tool budgets are not distributed across replicas until a shared store (for example Redis with TTL keyed by invocation_id) is implemented; see metering.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate Limiting

Overview

Architecture

Components

Configuration

Environment Variables

Rate-Limited Paths

Principal Dimensions

Skipped Paths

Response Headers

How It Works

Sliding Window Algorithm

Request Flow

Testing Rate Limiting

Cloud Run

Rate Limiting vs Usage Tracking

Notes

FilesExpand file tree

rate-limiting.md

Latest commit

History

rate-limiting.md

File metadata and controls

Rate Limiting

Overview

Architecture

Components

Configuration

Environment Variables

Rate-Limited Paths

Principal Dimensions

Skipped Paths

Response Headers

How It Works

Sliding Window Algorithm

Request Flow

Testing Rate Limiting

Cloud Run

Rate Limiting vs Usage Tracking

Notes