Title: Enable per-user concurrency limits by supporting negative cost in the rate limiting protocol
Description:
Add a hits_subtrahend field to the rate limiting (RLS) protocol that acts like the hits_addend field, but decreases the rate limit counter instead of increasing it. Together with the apply_on_stream_done flag this would enable the emulation of per-user concurrency limits.
Goal: Ensure that each client can only have n requests open at the same time. This ensures fair resource allocation among clients and protects against "accidental DoS", and is simpler to implement and to reason about than cost-based limits.
Approach: Use a rate limit counter to track the number of requests the user has currently open by incrementing the counter at the beginning of the request (as usual with rate limiting) and then decrementing it again when the request is complete (using apply_on_stream_done with hits_subtrahend). This would limit the number of requests the client can perform in parallel, without limiting the total number per unit time.
Considerations:
- If envoy (or the node it runs on) terminates unexpectedly while a client request is ongoing, the counter will fail to be decremented, leading to a situation wherea client could potentially be "locked out". This is solved naturally by the fact that rate limit counters are reset (for fixed window) or decremented (for sliding window) at regular intervals. In the conztext of enforcing a concurrency limit, the time unit for which the limit is defined determines how often concurrency counts are "forgiven", allowing "stuck" clients to be unstuck.
- As a consequence of this mechanism it is possible for clients to exceed the provided limit at certain times up to a factor of two (assuming the connection timeout it smaller than the time window of the limit): if the client already has n requests in flight at the end of a time window, the counter will be reset and the client will be able to make another n requests even before the first set of requests is finished. This is no worse and the fact that, with fixed window semantics, a client can exceed the request count limit (or cost limit) by a factor of two if it makes requests just before and right after the end of a time window.
Example, with a limit of 2 per minute (per client):
- start request 1, increment counter to 1
- start request 2, increment counter to 2
- start request 3, denied, too many requests
- end request 2, decrement counter to 1
- start request 4, increment counter to 2
- ...envoy terminates unexpectedly, counter remains at 2
- start request 5, denied, too many requests
- start request 6, denied, too many requests
- ...when the end of the 1 minute time window is reached...
- counter is reset to 0
- start request 7, increment counter to 1
- end request 7, decrement counter to 0
Title: Enable per-user concurrency limits by supporting negative cost in the rate limiting protocol
Description:
Add a
hits_subtrahendfield to the rate limiting (RLS) protocol that acts like thehits_addendfield, but decreases the rate limit counter instead of increasing it. Together with theapply_on_stream_doneflag this would enable the emulation of per-user concurrency limits.Goal: Ensure that each client can only have n requests open at the same time. This ensures fair resource allocation among clients and protects against "accidental DoS", and is simpler to implement and to reason about than cost-based limits.
Approach: Use a rate limit counter to track the number of requests the user has currently open by incrementing the counter at the beginning of the request (as usual with rate limiting) and then decrementing it again when the request is complete (using
apply_on_stream_donewithhits_subtrahend). This would limit the number of requests the client can perform in parallel, without limiting the total number per unit time.Considerations:
Example, with a limit of 2 per minute (per client):