Draft
Conversation
Contributor
|
Pulling this out of triage (and the Pipeline board in general), since it's now referenced in multiple issues in the board. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Short description of the changes
Sometimes you have a massive cluster. This cluster has hundreds of nodes. The load balancer sending in traffic is like, "I'm gonna send it all to this one node!"
Well, ALBs will respect the "Retry-After", so even if clients don't, you can throw a 503 real quick and the retry will get routed to a less stressed node in the cluster.
This may be problematic if you have a retry limit of 3 and your client doesn't respect that header, it may hit 3 nodes that are hotter than average and then you're dropping data.
OpenTelemetry Exporters should respect the retry-after header but will definitely retry after a 503 response code.
Implementation
This is behind a configuration that won't be enabled by default.
If you have stress relief in monitor, you'll have access to the metrics and can enable it with the following configuration:
Configuration ergonomics are here because this will need to be tuned live since production traffic only hits production deployments. The three operative configurations are:
InboundRejectionServer: incoming
This has options to be set to none, all, incoming, or peer. Peer is optimistic but probably shouldn't be used unless we fix up libhoney-go so it will respect the Retry-After header. It's certainly possible to do, but isn't in scope for this PR.
Enabling it on
incomingwill allow the StressCheck middleware to fire and evaluate the situation before engaging the parsers and storage.InboundRejectionTolerance: 5
This allows you to make your individual pods deal run more lopsided to reduce 503 errors. If your cluster runs really hot and you need the load balancer on point, like 100+ nodes pushing gigabytes per second, this should be like 1 or 2. If your cluster is lower stress or scaled for spikes, you can increase it to 10 or 15.
RetryAfterSeconds: 5
This one sets the Retry-After header to a number of seconds so that any clients that respect it will wait that long and the load balancer will lay off the node for that long.
Magical behavior: If you set it to zero, it will use the TraceTimeout setting since that would logically be when a lot of the load will be gone. This is probably excessive in most cases, but for VERY LARGE and VERY LOPSIDED deployments, it may make sense.
StatusTooManyRequests: 429
This one sets the response code. If all your clients are OTLP, they should respect retry-after and you don't actually need to fool a load balancer.
If your clients are Libhoney and/or custom things, and you want to rely on the load balancer, you can change this to 503.
Any other codes will not be used and it will default to 429.
Failure types
Alternatives
I noticed that the mini-load-balancing in #1525 doesn't seem to be sufficient to offload spans and we need the load balancer to participate.