-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
What happened?
We observed that kubelet performs automatic image garbage collection when disk usage exceeds the default imageGCHighThresholdPercent (85%).
Reference:
https://kubernetes.io/docs/concepts/architecture/garbage-collection/#containers-images
By default:
- imageGCHighThresholdPercent = 85%
- imageGCLowThresholdPercent = 80%
However, kube-prometheus includes a prediction-based alert similar to:
predict_linear(node_filesystem_free_bytes{device=~"/.*"}[2d], 3600 * 24 * 5) < 0
This predicts filesystem exhaustion within 5 days based on recent growth.
In practice, we observed:
- Disk usage increased rapidly
- Alert fired predicting exhaustion
- Kubelet GC triggered automatically at ~85%
- Disk usage dropped
- Node remained healthy
This results in what appears to be a false positive alert during normal kubelet behavior.
Question:
Should the prediction alert take kubelet's image GC threshold into account?
For example, should the alert be suppressed if usage is below imageGCHighThresholdPercent, since kubelet will automatically intervene?
Or is the intended design that operators tune this alert manually?
Would appreciate guidance on recommended alignment between kubelet GC behavior and disk prediction alerts.
Thanks.
Environment
- Kubernetes: v1.33.0
- Prometheus: v3.5.0 (quay.io/prometheus/prometheus:v3.5.0)
- Prometheus Operator: quay.io/prometheus-operator/prometheus-operator:v0.85.0