linkerd-destination memory usage scaling with non-meshed pod churn #14177

cpotteck · 2025-06-25T16:45:59Z

cpotteck
Jun 25, 2025

Hello, this is a continuation of this slack thread (The slack thread is now close to 90 days old and will soon be deleted).

We are seeing memory usage spikes on edge-25.1.1 , and previously on 2.14.10 , “linkerd_reconnect: Failed to connect” error logs are showing, but infrequent.
We are seeing destination memory usage scaling closely correlating with general workload scaling (see screenshot showing destination memory usage on top and cluster node count at the bottom).

Usage from linkerd-proxy and policy remains containers generally stable.

Usage from the linkerd-proxy-injector deployment does also scale in similar ways, though I believe that is expected.

See this 12-hour sample window:

Memory usage for 3 linkerd-destination replicas (biggest changes is observed on destination, the linkerd-proxy contrainers also scale somewhat proportionally, but with much smaller spikes (like 9MiB to 9.6MiB and back down to 9MiB)):

There is some churn to match, see node count and running pod count over the same time window (majority of this churn is Jenkins agents):

Out of those pods, only about 9 are meshed (not counting linkerd control plane/multicluster/viz), and that number stays constant. The pods are 7 grafana single-replica deployments, 1 thanos-query deployment with 2 replicas which is used as a Grafana data source.

Is this linkerd-destination memory usage pattern expected and am I misunderstanding scope of linkerd-destination in the stack? Architecture docs state:

The destination service is used by the data plane proxies to determine various aspects of their behavior. It is used to fetch service discovery information (i.e. where to send a particular request and the TLS identity expected on the other end); to fetch policy information about which types of requests are allowed; to fetch service profile information used to inform per-route metrics, retries, and timeouts; and more.

We would not expect traffic through those meshed pods to also scale with the pod churn (at least not that consistently), and those pods should not be producing traffic to/from meshed pods as Jenkins agents and prometheus are not meshed.

I have read through the other linkerd-destination memory issue discussions and issues but did not see anything applicable to us on this Linkerd version: #12924, #11129, #12104, #11315, #9947, #8270, #5939

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

linkerd-destination memory usage scaling with non-meshed pod churn #14177

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

linkerd-destination memory usage scaling with non-meshed pod churn #14177

Uh oh!

cpotteck Jun 25, 2025

Replies: 0 comments

cpotteck
Jun 25, 2025