Draft
Conversation
Implements intelligent per-worker load shedding to prevent high-volume
services from overwhelming workers during traffic spikes, protecting
low-volume services during the autoscaling window (~1 minute).
Architecture:
- In-memory request tracking using 60-second sliding window with deques
- Per-worker load monitoring via concurrent request counter
- Contribution-based throttling (>=20% of volume or 10x median)
- Comprehensive metrics for observability (gauge + counter)
- No external dependencies (Redis-free)
Throttling Logic:
- Only activates when worker exceeds HIGH_WATER_MARK (80% capacity)
- Throttles services that meet either condition:
* Contributing >=20% of total request volume (catches single spammers)
* Volume >=10x median (catches outliers in multi-service scenarios)
- Returns 429 with Retry-After: 5 header
Components Added:
- app/load_shedding.py: ServiceVolumeTracker with deque-based tracking
- ServiceUnavailableError: Custom 429 exception for throttled requests
- ConcurrentRequestCounter: Per-worker load tracking
- Integration in validators.check_rate_limiting()
Configuration (disabled by default):
- LOAD_SHEDDING_ENABLED: false (feature flag, opt-in)
- HIGH_WATER_MARK: 26 (80% of 32 concurrent capacity per worker)
- THROTTLE_CONTRIBUTION_PCT: 20 (throttle if >=20% of volume)
- THROTTLE_VOLUME_MEDIAN_MULTIPLE: 10 (throttle if >=10x median)
Observability Metrics:
- worker_load_shedding_active (Gauge): Current state per worker (1=active, 0=inactive)
Sum across workers shows total workers currently load shedding
- load_shedding_activations_total (Counter): Activation events per worker
Incremented once per transition from healthy -> overloaded
Useful for: detecting flapping, historical analysis, missed events between scrapes
- load_shedding.throttled.{service_id} (Counter): Per-service throttle count
Logging:
- ACTIVATED/DEACTIVATED events with current vs HIGH_WATER_MARK
- Per-service throttling decisions with volume metrics
- Contribution percentages and median comparisons
Implementation Details:
- ServiceVolumeTracker: Deque-based sliding window
- Lazy cleanup of expired timestamps to keep memory bounded
- State tracking for activation/deactivation transitions
- No locks needed (Eventlet cooperative concurrency)
- TYPE_CHECKING pattern for clean type hints
Each eventlet worker can handle 8 concurrent connections, not 32. The previous HIGH_WATER_MARK of 26 would never trigger since workers max out at 8 concurrent requests. Changes: - Update HIGH_WATER_MARK default from 26 to 6 (75% of 8 connections) - Fix fallback value in is_worker_overloaded() from 26 to 6 - Update test mocks to use realistic values (3-8 instead of 10-30)
626ad24 to
6658214
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Trello
Note
This change has not been tested under load in an environment yet.
The change should be considered a proposal until further testing has taken place.
What
Implements selective request throttling to protect worker capacity during traffic spikes. When workers become overloaded, the system identifies and throttles high-volume services, preventing resource exhaustion while maintaining service for other users.
Why
Workers have fixed capacity (8 concurrent requests per worker, 32 total across 4 workers). When a few high-volume services consume most available slots, low-volume services experience degraded performance or failures due to queue buildup. Manual intervention is currently required to identify and mitigate these scenarios.
Solution
Contribution-based throttling with statistical safeguards:
Memory-efficient design:
Configuration (opt-in)
Observability
Prometheus metrics:
worker_load_shedding_active(Gauge): Current state per worker (1=active, 0=inactive)load_shedding_activations_total(Counter): Total activation events across workersLogging: