Skip to content

HTTP Load Shedding for Worker Protection#4711

Draft
corlettb wants to merge 5 commits intomainfrom
BC-http-load-shedding-2
Draft

HTTP Load Shedding for Worker Protection#4711
corlettb wants to merge 5 commits intomainfrom
BC-http-load-shedding-2

Conversation

@corlettb
Copy link
Contributor

@corlettb corlettb commented Jan 27, 2026

Trello

Note

This change has not been tested under load in an environment yet.

The change should be considered a proposal until further testing has taken place.

What

Implements selective request throttling to protect worker capacity during traffic spikes. When workers become overloaded, the system identifies and throttles high-volume services, preventing resource exhaustion while maintaining service for other users.

Why

Workers have fixed capacity (8 concurrent requests per worker, 32 total across 4 workers). When a few high-volume services consume most available slots, low-volume services experience degraded performance or failures due to queue buildup. Manual intervention is currently required to identify and mitigate these scenarios.

Solution

Contribution-based throttling with statistical safeguards:

  • Tracks per-service request volumes using 60-second sliding windows
  • When worker load exceeds configurable threshold (default: 6/8 requests = 75%), identifies services that are:
    • Contributing >=20% of total request volume, OR
    • Requesting at >=10x the median service volume
  • Returns HTTP 429 (Retry-After: 60s) to throttled services
  • Safety gates prevent throttling in edge cases (single service, low traffic scenarios)

Memory-efficient design:

  • Per-worker in-memory tracking
  • Periodic cleanup prevents unbounded growth from idle services
  • No database or shared state required

Configuration (opt-in)

LOAD_SHEDDING_ENABLED=true          # Default: false
HIGH_WATER_MARK=6                   # Overload threshold (default: 6/8 per worker)
THROTTLE_CONTRIBUTION_PCT=20        # Contribution % threshold
THROTTLE_CONTRIBUTION_MIN_SERVICES=5   # Min services before applying contribution rule
THROTTLE_CONTRIBUTION_MIN_VOLUME=50    # Min total volume before applying contribution rule
THROTTLE_VOLUME_MEDIAN_MULTIPLE=10  # Median multiplier threshold

Observability

Prometheus metrics:

  • worker_load_shedding_active (Gauge): Current state per worker (1=active, 0=inactive)
  • load_shedding_activations_total (Counter): Total activation events across workers

Logging:

  • Load shedding activation/deactivation events
  • Individual throttling decisions with service details

Implements intelligent per-worker load shedding to prevent high-volume
services from overwhelming workers during traffic spikes, protecting
low-volume services during the autoscaling window (~1 minute).

Architecture:
- In-memory request tracking using 60-second sliding window with deques
- Per-worker load monitoring via concurrent request counter
- Contribution-based throttling (>=20% of volume or 10x median)
- Comprehensive metrics for observability (gauge + counter)
- No external dependencies (Redis-free)

Throttling Logic:
- Only activates when worker exceeds HIGH_WATER_MARK (80% capacity)
- Throttles services that meet either condition:
  * Contributing >=20% of total request volume (catches single spammers)
  * Volume >=10x median (catches outliers in multi-service scenarios)
- Returns 429 with Retry-After: 5 header

Components Added:
- app/load_shedding.py: ServiceVolumeTracker with deque-based tracking
- ServiceUnavailableError: Custom 429 exception for throttled requests
- ConcurrentRequestCounter: Per-worker load tracking
- Integration in validators.check_rate_limiting()

Configuration (disabled by default):
- LOAD_SHEDDING_ENABLED: false (feature flag, opt-in)
- HIGH_WATER_MARK: 26 (80% of 32 concurrent capacity per worker)
- THROTTLE_CONTRIBUTION_PCT: 20 (throttle if >=20% of volume)
- THROTTLE_VOLUME_MEDIAN_MULTIPLE: 10 (throttle if >=10x median)

Observability Metrics:
- worker_load_shedding_active (Gauge): Current state per worker (1=active, 0=inactive)
  Sum across workers shows total workers currently load shedding
- load_shedding_activations_total (Counter): Activation events per worker
  Incremented once per transition from healthy -> overloaded
  Useful for: detecting flapping, historical analysis, missed events between scrapes
- load_shedding.throttled.{service_id} (Counter): Per-service throttle count

Logging:
- ACTIVATED/DEACTIVATED events with current vs HIGH_WATER_MARK
- Per-service throttling decisions with volume metrics
- Contribution percentages and median comparisons

Implementation Details:
- ServiceVolumeTracker: Deque-based sliding window
- Lazy cleanup of expired timestamps to keep memory bounded
- State tracking for activation/deactivation transitions
- No locks needed (Eventlet cooperative concurrency)
- TYPE_CHECKING pattern for clean type hints
Each eventlet worker can handle 8 concurrent connections, not 32.
The previous HIGH_WATER_MARK of 26 would never trigger since workers
max out at 8 concurrent requests.

Changes:
- Update HIGH_WATER_MARK default from 26 to 6 (75% of 8 connections)
- Fix fallback value in is_worker_overloaded() from 26 to 6
- Update test mocks to use realistic values (3-8 instead of 10-30)
@corlettb corlettb force-pushed the BC-http-load-shedding-2 branch from 626ad24 to 6658214 Compare January 27, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant