limiter: Short-circuit when no rates or connections are configured#63751
Draft
juliaogris wants to merge 3 commits intomasterfrom
Draft
limiter: Short-circuit when no rates or connections are configured#63751juliaogris wants to merge 3 commits intomasterfrom
juliaogris wants to merge 3 commits intomasterfrom
Conversation
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
cceec54 to
9310bb2
Compare
9310bb2 to
e038a89
Compare
Contributor
Author
|
@codex PTAL |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
44903da to
b0f9512
Compare
Add micro-benchmarks comparing the cost of each limiter path when no
limits are configured (noop) against a high default rate (equivalent
to the old 100M req/s fake rate). Cover all call sites:
RegisterRequest, HTTP middleware, RegisterRequestAndConnection, and
both gRPC interceptors (unary and stream).
Also add a macro-benchmark (BenchmarkHTTPServer) that measures
end-to-end HTTP throughput through a real TCP listener with concurrent
clients to show how the limiter's overhead compares to network I/O.
Before the short-circuit fix, NoRates and HighDefaultRate produce
identical results because `NewRateLimiter` injects the fake default:
BenchmarkRegisterRequest/NoRates 298 ns/op 64 B/op 3 allocs/op
BenchmarkRegisterRequest/HighDefaultRate 299 ns/op 64 B/op 3 allocs/op
BenchmarkHTTPMiddleware/NoLimits 405 ns/op 264 B/op 7 allocs/op
BenchmarkHTTPMiddleware/HighDefaultRate 406 ns/op 264 B/op 7 allocs/op
BenchmarkRegisterRequestAndConnection/NoLimits 320 ns/op 96 B/op 4 allocs/op
BenchmarkRegisterRequestAndConnection/HighDefaultRate 321 ns/op 96 B/op 4 allocs/op
BenchmarkUnaryInterceptor/NoLimits 320 ns/op 64 B/op 3 allocs/op
BenchmarkUnaryInterceptor/HighDefaultRate 321 ns/op 64 B/op 3 allocs/op
BenchmarkStreamInterceptor/NoLimits 338 ns/op 96 B/op 4 allocs/op
BenchmarkStreamInterceptor/HighDefaultRate 339 ns/op 96 B/op 4 allocs/op
BenchmarkHTTPServer/NoLimits 9770 ns/op 4752 B/op 57 allocs/op
BenchmarkHTTPServer/HighDefaultRate 9700 ns/op 4752 B/op 57 allocs/op
Apple M4 Pro, go1.26.1 darwin/arm64.
Every Teleport service creates a Limiter unconditionally, even when
no `connection_limits` are set. Previously, `NewRateLimiter` injected
a fake default rate of 100M req/s to satisfy a non-empty-rates
invariant in `TokenLimiter`. Every request still paid the full cost:
extract client IP, acquire mutex, FnCache lookup, and token bucket
consume - all for a rate that never rejects.
Remove the fake default rate and add early returns throughout the
limiter stack: `RateLimiter.RegisterRequest` returns immediately when
the effective rate set is empty, `TokenLimiter.ServeHTTP` skips IP
extraction and rate consumption, and `ConnectionsLimiter` skips
tracking when `maxConnections` is zero.
Micro-benchmarks on Apple M4 Pro show the noop path is 5-160x faster
in isolation (NoLimits = noop after fix, HighDefaultRate = old
behavior with fake 100M rate):
BenchmarkRegisterRequest/NoRates 1.85 ns/op 0 B/op 0 allocs/op
BenchmarkRegisterRequest/HighDefaultRate 306 ns/op 64 B/op 3 allocs/op
BenchmarkHTTPMiddleware/NoLimits 82.0 ns/op 208 B/op 4 allocs/op
BenchmarkHTTPMiddleware/HighDefaultRate 405 ns/op 264 B/op 7 allocs/op
BenchmarkRegisterRequestAndConnection/NoLimits 18.8 ns/op 32 B/op 1 allocs/op
BenchmarkRegisterRequestAndConnection/HighDefaultRate 323 ns/op 96 B/op 4 allocs/op
BenchmarkUnaryInterceptor/NoLimits 18.5 ns/op 0 B/op 0 allocs/op
BenchmarkUnaryInterceptor/HighDefaultRate 322 ns/op 64 B/op 3 allocs/op
BenchmarkStreamInterceptor/NoLimits 33.2 ns/op 32 B/op 1 allocs/op
BenchmarkStreamInterceptor/HighDefaultRate 337 ns/op 96 B/op 4 allocs/op
However, a macro-benchmark through a real TCP HTTP server shows no
measurable difference in end-to-end throughput. The limiter's ~300ns
is noise against the ~9500ns of TCP/HTTP overhead per request:
BenchmarkHTTPServer/NoLimits 9510 ns/op 4681 B/op 54 allocs/op
BenchmarkHTTPServer/HighDefaultRate 9600 ns/op 4752 B/op 57 allocs/op
This change is primarily about code clarity - remove the fake default
rate workaround and make the limiter honestly do nothing when no
limits are configured - rather than a meaningful throughput
improvement.
b0f9512 to
3dea77f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Every Teleport service creates a
Limiterunconditionally, even when noconnection_limitsare set. Previously,NewRateLimiterinjected a fakedefault rate of 100M req/s to satisfy a non-empty-rates invariant in
TokenLimiter. This meant every request still paid the full cost of therate limiting machinery - extract client IP, acquire mutex, FnCache
lookup, token bucket consume - all for a rate that would never reject.
This PR removes the fake default rate and adds early returns so the
limiter honestly does nothing when no limits are configured:
RateLimiter.RegisterRequestreturns immediately when the effective rateset is empty,
TokenLimiter.ServeHTTPskips IP extraction and rateconsumption, and
ConnectionsLimiterskips tracking whenmaxConnectionsis zero.This is primarily a code clarity change. Micro-benchmarks show the noop
path is 5-160x faster in isolation, but a macro-benchmark through a real
TCP HTTP server shows no measurable difference in end-to-end throughput -
the limiter's ~300ns is noise against the ~9500ns of TCP/HTTP overhead
per request:
This is a follow-on from PR gravitational/teleport#63720,
which added connection limiter support to the app service. While working
on that PR, I kept tripping over the fact that
NewRateLimiteralwayscreates a
TokenLimiterwith an FnCache and background cleanup goroutine,even when no rates are configured - which is the common case for most
services. That motivated this cleanup.
I'm on the fence about whether this PR is worth merging. The performance
gain is not meaningful in practice, and any code change to the limiter
carries risk of introducing subtle issues. Looking for guidance on whether
to close this or proceed.