Skip to content

Commit efcaa23

Browse files
committed
feat(k8s-observability-monitoring): add kubelet scraping to customAlloy
Add native kubelet and cAdvisor metrics scraping to customAlloy. This enables PVC volume stats (kubelet_volume_stats_*) without relying on upstream alloy-metrics which has the service.namespace promotion issue. Enable with customAlloy.kubelet.enabled: true
1 parent e192349 commit efcaa23

File tree

4 files changed

+163
-3
lines changed

4 files changed

+163
-3
lines changed

charts/k8s-observability-monitoring/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
apiVersion: v2
22
name: k8s-observability-monitoring
3-
version: 0.35.2
3+
version: 0.36.0
44
description: Helm chart for k8s-observability-monitoring
55

66
# renovate: datasource=helm depName=k8s-monitoring registryUrl=https://grafana.github.io/helm-charts

charts/k8s-observability-monitoring/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# k8s-observability-monitoring
22

3-
![Version: 0.35.2](https://img.shields.io/badge/Version-0.35.2-informational?style=flat-square) ![AppVersion: 3.8.3](https://img.shields.io/badge/AppVersion-3.8.3-informational?style=flat-square)
3+
![Version: 0.36.0](https://img.shields.io/badge/Version-0.36.0-informational?style=flat-square) ![AppVersion: 3.8.3](https://img.shields.io/badge/AppVersion-3.8.3-informational?style=flat-square)
44

55
Helm chart for k8s-observability-monitoring
66

@@ -105,7 +105,7 @@ This creates a `PolicyException` resource that allows `k8s-monitoring-alloy-*` p
105105
| clusterMetrics.nodeExporter.deploy | bool | `false` | Deploy node-exporter (set to false if using existing deployment) |
106106
| clusterMetrics.nodeExporter.enabled | bool | `true` | Enable scraping node-exporter |
107107
| clusterName | string | `""` | Cluster name for telemetry labeling. Must be set to a non-empty value at install time. |
108-
| customAlloy | object | `{"attributeCleanup":{"enabled":true},"attributePromotion":{"enabled":false},"clustering":{"enabled":false},"enabled":false,"kubeStateMetrics":{"extraMetricProcessingRules":""},"liveDebugging":{"enabled":true},"replaceUpstreamCollector":false,"replicas":1,"resources":{"limits":{"memory":"1Gi"},"requests":{"cpu":"100m","memory":"512Mi"}},"sendingQueue":{"enabled":true}}` | Custom Alloy deployment for metrics scraping This deploys a separate Alloy instance that can scrape kube-state-metrics and optionally replace the upstream alloy-metrics collector entirely. |
108+
| customAlloy | object | `{"attributeCleanup":{"enabled":true},"attributePromotion":{"enabled":false},"clustering":{"enabled":false},"enabled":false,"kubeStateMetrics":{"extraMetricProcessingRules":""},"kubelet":{"enabled":false},"liveDebugging":{"enabled":true},"replaceUpstreamCollector":false,"replicas":1,"resources":{"limits":{"memory":"1Gi"},"requests":{"cpu":"100m","memory":"512Mi"}},"sendingQueue":{"enabled":true}}` | Custom Alloy deployment for metrics scraping This deploys a separate Alloy instance that can scrape kube-state-metrics and optionally replace the upstream alloy-metrics collector entirely. |
109109
| customAlloy.attributeCleanup | object | `{"enabled":true}` | Remove high-cardinality attributes to reduce storage costs Matches k8s-monitoring attribute cleanup |
110110
| customAlloy.attributeCleanup.enabled | bool | `true` | Enable attribute cleanup |
111111
| customAlloy.attributePromotion | object | `{"enabled":false}` | Promote useful attributes from datapoint to resource level |
@@ -115,6 +115,8 @@ This creates a `PolicyException` resource that allows `k8s-monitoring-alloy-*` p
115115
| customAlloy.enabled | bool | `false` | Enable custom Alloy deployment |
116116
| customAlloy.kubeStateMetrics | object | `{"extraMetricProcessingRules":""}` | kube-state-metrics scraping configuration |
117117
| customAlloy.kubeStateMetrics.extraMetricProcessingRules | string | `""` | Extra metric processing rules (Alloy relabel config syntax) |
118+
| customAlloy.kubelet | object | `{"enabled":false}` | Kubelet metrics scraping configuration (includes PVC volume stats) |
119+
| customAlloy.kubelet.enabled | bool | `false` | Enable kubelet and cAdvisor metrics scraping. Provides kubelet_volume_stats_* metrics for PVC capacity monitoring. |
118120
| customAlloy.liveDebugging | object | `{"enabled":true}` | Live debugging via Alloy UI (port 12345) |
119121
| customAlloy.liveDebugging.enabled | bool | `true` | Enable live debugging |
120122
| customAlloy.replaceUpstreamCollector | bool | `false` | Replace upstream alloy-metrics collector entirely. When true, disables alloy-metrics and customAlloy handles all metrics collection including ServiceMonitors, PodMonitors, and Probes (if prometheusOperatorObjects is enabled). |

charts/k8s-observability-monitoring/templates/custom-alloy-configmap.yaml

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,159 @@ data:
123123
forward_to = [otelcol.receiver.prometheus.default.receiver]
124124
}
125125
126+
{{- if .Values.customAlloy.kubelet.enabled }}
127+
// Kubelet Metrics Discovery
128+
discovery.kubernetes "kubelet" {
129+
role = "node"
130+
}
131+
132+
discovery.relabel "kubelet" {
133+
targets = discovery.kubernetes.kubelet.targets
134+
135+
rule {
136+
target_label = "__address__"
137+
replacement = "kubernetes.default.svc.cluster.local:443"
138+
}
139+
140+
rule {
141+
source_labels = ["__meta_kubernetes_node_name"]
142+
regex = "(.+)"
143+
target_label = "__metrics_path__"
144+
replacement = "/api/v1/nodes/$1/proxy/metrics"
145+
}
146+
147+
rule {
148+
source_labels = ["__meta_kubernetes_node_name"]
149+
target_label = "node"
150+
}
151+
152+
rule {
153+
source_labels = ["__meta_kubernetes_node_name"]
154+
target_label = "instance"
155+
}
156+
}
157+
158+
prometheus.scrape "kubelet" {
159+
targets = discovery.relabel.kubelet.output
160+
job_name = "integrations/kubernetes/kubelet"
161+
scrape_interval = "60s"
162+
scrape_timeout = "10s"
163+
scheme = "https"
164+
165+
authorization {
166+
type = "Bearer"
167+
credentials_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
168+
}
169+
170+
tls_config {
171+
ca_file = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
172+
insecure_skip_verify = true
173+
}
174+
175+
{{- if .Values.customAlloy.clustering.enabled }}
176+
clustering {
177+
enabled = true
178+
}
179+
{{- end }}
180+
181+
forward_to = [prometheus.relabel.kubelet.receiver]
182+
}
183+
184+
// Kubelet cAdvisor Metrics
185+
discovery.relabel "cadvisor" {
186+
targets = discovery.kubernetes.kubelet.targets
187+
188+
rule {
189+
target_label = "__address__"
190+
replacement = "kubernetes.default.svc.cluster.local:443"
191+
}
192+
193+
rule {
194+
source_labels = ["__meta_kubernetes_node_name"]
195+
regex = "(.+)"
196+
target_label = "__metrics_path__"
197+
replacement = "/api/v1/nodes/$1/proxy/metrics/cadvisor"
198+
}
199+
200+
rule {
201+
source_labels = ["__meta_kubernetes_node_name"]
202+
target_label = "node"
203+
}
204+
205+
rule {
206+
source_labels = ["__meta_kubernetes_node_name"]
207+
target_label = "instance"
208+
}
209+
}
210+
211+
prometheus.scrape "cadvisor" {
212+
targets = discovery.relabel.cadvisor.output
213+
job_name = "integrations/kubernetes/cadvisor"
214+
scrape_interval = "60s"
215+
scrape_timeout = "10s"
216+
scheme = "https"
217+
218+
authorization {
219+
type = "Bearer"
220+
credentials_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
221+
}
222+
223+
tls_config {
224+
ca_file = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
225+
insecure_skip_verify = true
226+
}
227+
228+
{{- if .Values.customAlloy.clustering.enabled }}
229+
clustering {
230+
enabled = true
231+
}
232+
{{- end }}
233+
234+
forward_to = [prometheus.relabel.cadvisor.receiver]
235+
}
236+
237+
prometheus.relabel "kubelet" {
238+
max_cache_size = 100000
239+
forward_to = [otelcol.receiver.prometheus.default.receiver]
240+
}
241+
242+
prometheus.relabel "cadvisor" {
243+
max_cache_size = 100000
244+
// Drop high-cardinality container metrics
245+
rule {
246+
source_labels = ["__name__"]
247+
regex = "container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)"
248+
action = "drop"
249+
}
250+
rule {
251+
source_labels = ["__name__"]
252+
regex = "container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)"
253+
action = "drop"
254+
}
255+
rule {
256+
source_labels = ["__name__"]
257+
regex = "container_memory_(mapped_file|swap)"
258+
action = "drop"
259+
}
260+
rule {
261+
source_labels = ["__name__"]
262+
regex = "container_(file_descriptors|tasks_state|threads_max)"
263+
action = "drop"
264+
}
265+
rule {
266+
source_labels = ["__name__", "interface"]
267+
regex = "container_network_.*;(cali|cilium|cni|lxc|nodelocaldns|tunl).*"
268+
action = "drop"
269+
}
270+
rule {
271+
source_labels = ["__name__"]
272+
regex = "container_spec.*"
273+
action = "drop"
274+
}
275+
forward_to = [otelcol.receiver.prometheus.default.receiver]
276+
}
277+
{{- end }}
278+
126279
// OTEL Pipeline
127280
otelcol.receiver.prometheus "default" {
128281
output {

charts/k8s-observability-monitoring/values.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,11 @@ customAlloy:
171171
kubeStateMetrics:
172172
# -- Extra metric processing rules (Alloy relabel config syntax)
173173
extraMetricProcessingRules: ""
174+
# -- Kubelet metrics scraping configuration (includes PVC volume stats)
175+
kubelet:
176+
# -- Enable kubelet and cAdvisor metrics scraping.
177+
# Provides kubelet_volume_stats_* metrics for PVC capacity monitoring.
178+
enabled: false
174179
# -- Resource requests and limits
175180
resources:
176181
requests:

0 commit comments

Comments
 (0)