Skip to content

Commit bc233fe

Browse files
committed
feat(k8s-observability-monitoring): add sending_queue size limits to prevent OOM
Add configurable queueSize and numConsumers parameters to the OTLP exporter sending_queue. Without queue_size limits, memory can grow unbounded during destination outages, causing OOM kills. Default queueSize set to 500 batches to balance resilience with memory usage.
1 parent de8b64b commit bc233fe

File tree

3 files changed

+15
-2
lines changed

3 files changed

+15
-2
lines changed

charts/k8s-observability-monitoring/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
apiVersion: v2
22
name: k8s-observability-monitoring
3-
version: 0.37.0
3+
version: 0.38.0
44
description: Helm chart for k8s-observability-monitoring
55

66
# renovate: datasource=helm depName=k8s-monitoring registryUrl=https://grafana.github.io/helm-charts

charts/k8s-observability-monitoring/templates/custom-alloy-configmap.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,12 @@ data:
410410
{{- if $.Values.customAlloy.sendingQueue.enabled }}
411411
sending_queue {
412412
enabled = true
413+
{{- if $.Values.customAlloy.sendingQueue.queueSize }}
414+
queue_size = {{ $.Values.customAlloy.sendingQueue.queueSize }}
415+
{{- end }}
416+
{{- if $.Values.customAlloy.sendingQueue.numConsumers }}
417+
num_consumers = {{ $.Values.customAlloy.sendingQueue.numConsumers }}
418+
{{- end }}
413419
}
414420
{{- end }}
415421
}

charts/k8s-observability-monitoring/values.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,10 +155,17 @@ customAlloy:
155155
liveDebugging:
156156
# -- Enable live debugging
157157
enabled: true
158-
# -- Sending queue configuration for resilience during destination outages
158+
# -- Sending queue configuration for resilience during destination outages.
159+
# The queue buffers batches when the destination is unavailable.
160+
# Without queue_size limit, memory can grow unbounded during outages causing OOM.
159161
sendingQueue:
160162
# -- Enable sending queue
161163
enabled: true
164+
# -- Maximum number of batches kept in memory (default: 1000 if unset).
165+
# Lower values prevent OOM during extended outages. Recommended: 100-500.
166+
queueSize: 500
167+
# -- Number of parallel consumers sending batches (default: 10 if unset)
168+
numConsumers: 10
162169
# -- Remove high-cardinality attributes to reduce storage costs
163170
# Matches k8s-monitoring attribute cleanup
164171
attributeCleanup:

0 commit comments

Comments
 (0)