Skip to content

Server-side adaptive rate limiting and memory-aware admission control #21513

@arnav-chakraborty

Description

@arnav-chakraborty

What would you like to be added?

What would you like to be added?

A built-in, configurable server-side rate limiting and admission control mechanism within etcd that allows operators to protect the server from overload and OOM — without relying on all clients to cooperate.

At a high level, this could include capabilities such as:

  • Request rate limiting — the ability to cap the rate of incoming requests (globally, per-client, or per-method) so that a burst of traffic doesn't overwhelm the server
  • Concurrent request limiting — capping the number of in-flight requests the server processes simultaneously, similar to how kube-apiserver provides --max-requests-inflight
  • Memory-aware admission control — the ability to reject or shed load when the server's memory usage approaches a dangerous threshold, rather than allowing OOM
  • Graceful degradation — when limits are hit, the server should degrade gracefully (e.g., returning proper gRPC error codes with backpressure signals) rather than crashing or becoming indefinitely unstable. Raft heartbeats and leader election traffic should never be affected.

The exact design, flag surface, and algorithm choices are open for community discussion — the core ask is that etcd should have some first-class mechanism to protect itself from client-driven overload.

Why is this needed?

etcd currently has no mechanism to protect itself from client-driven overload, and the consequences are catastrophic — the entire cluster goes down or OOMs, taking everything that depends on it (Kubernetes control plane, DNS, service discovery) with it.

The existing "protections" are inadequate:

  • Storage quota (--quota-backend-bytes) only triggers after the database is already full, putting etcd into a read-only state that requires manual intervention to recover
  • The apply-commit index gap threshold (hardcoded at 5000) is a blunt, non-configurable signal that fires too late to prevent damage
  • --max-concurrent-streams defaults to unlimited and only limits gRPC streams, not actual request throughput

The current guidance is insufficient

"Clients should rate limit themselves" does not work because:

  • Operators cannot control all clients (especially in multi-tenant Kubernetes clusters)
  • A single misbehaving controller or CRD operator can take down the entire cluster
  • Kubernetes API Priority and Fairness only protects etcd indirectly via kube-apiserver; direct etcd clients (backup tools, migration scripts, monitoring agents) bypass it entirely
  • Cloud providers (AWS, Google) have resorted to replacing etcd internals entirely — this should not be necessary for the open-source community

Industry precedent

Every major database and distributed system provides server-side admission control:

  • PostgreSQL: max_connections
  • MySQL: max_connections + thread pool
  • CockroachDB: admission control system
  • TiKV: flow control

etcd is an outlier in leaving overload protection entirely to clients, despite being the most critical single point of failure in Kubernetes infrastructure.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions