-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Component(s)
processor/k8sattributes
Describe the issue you're reporting
We're running otel-collector in a pretty big k8s cluster with 16.7K replica sets. Recently we had to bump memory limits from 512MB to 1GB to avoid the collector pods getting stuck in a crash loop from requesting 600MB+, but after they get through the first minute they stabilize at ~180MB.
Capturing heap profiles every 500ms on startup revealed the following pattern (totals reported by pprof):
heap-1.pb.gz 191.72MB total
heap-2.pb.gz 191.72MB total
...
heap-11.pb.gz 191.72MB total
heap-12.pb.gz 191.72MB total
heap-13.pb.gz 341.29MB total
heap-14.pb.gz 341.29MB total
heap-15.pb.gz 341.29MB total
heap-16.pb.gz 341.29MB total
heap-17.pb.gz 341.29MB total
heap-18.pb.gz 341.29MB total
heap-19.pb.gz 53.83MB total
heap-20.pb.gz 53.83MB total
...
heap-51.pb.gz 51.10MB total
heap-52.pb.gz 51.10MB total
The first wave is mostly 157MB of (*runtime.Unknown).Unmarshal() in k8s client (text, svg).
The second wave is 157MB from (*runtime.Unknown).Unmarshal() and 176MB from (*v1.ReplicaSetList).Unmarshal() in k8s client (text, svg).
The final stable state is 23MB in kube.removeUnnecessaryReplicaSetData() in processor/k8sattributes (text, svg).
From what I understand in k8s attributes processor and k8s client code, collector needs metadata information about the replica sets, and to get that, it lists replica sets via the k8s client then discards the information it doesn't need. K8s client does it all at once without paging, so it needs to keep 20KB per resource in memory (10KB for temp Unknowns and 10KB for result), that's eventually stripped to 1.5KB per resource.
Claude suggests it should be possible to request just the metadata fields from k8s, deserialize them into a lighter contract and avoid the need for provisioning a higher memory container just for the spike at startup - example here. Is this something the team would be willing to consider?
Tip
React with π to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status