Skip to content

Commit a3fbf21

Browse files
authored
Make the daemonset rollout stuck alert configurable. (#989)
* Make the daemonset rollout stuck alert configurable. For bigger Kubernetes clusters with bigger node churn (for instance, cloud clusters with spot nodes), the daemonset rollouts often get stuck for longer than just 15 minutes. Since the alert might easily misfire even in cases where the delay is legitimate. This PR introduces configurable `for` value to allow for customization. As a default, the original value `15m` is left, so the only real difference would be a slight change in the alert message formatting. Signed-off-by: Milan Plzik <[email protected]> * Fix the tests. Signed-off-by: Milan Plzik <[email protected]> --------- Signed-off-by: Milan Plzik <[email protected]>
1 parent 72a1a23 commit a3fbf21

File tree

2 files changed

+7
-6
lines changed

2 files changed

+7
-6
lines changed

alerts/apps_alerts.libsonnet

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ local utils = import '../lib/utils.libsonnet';
44
_config+:: {
55
kubeStateMetricsSelector: error 'must provide selector for kube-state-metrics',
66
kubeJobTimeoutDuration: error 'must provide value for kubeJobTimeoutDuration',
7+
kubeDaemonSetRolloutStuckFor: '15m',
78
namespaceSelector: null,
89
prefixedNamespaceSelector: if self.namespaceSelector != null then self.namespaceSelector + ',' else '',
910
},
@@ -204,10 +205,10 @@ local utils = import '../lib/utils.libsonnet';
204205
severity: 'warning',
205206
},
206207
annotations: {
207-
description: 'DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least 15 minutes.',
208+
description: 'DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least %(kubeDaemonSetRolloutStuckFor)s.' % $._config,
208209
summary: 'DaemonSet rollout is stuck.',
209210
},
210-
'for': '15m',
211+
'for': $._config.kubeDaemonSetRolloutStuckFor,
211212
},
212213
{
213214
expr: |||

tests.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -822,7 +822,7 @@ tests:
822822
severity: warning
823823
exp_annotations:
824824
summary: "DaemonSet rollout is stuck."
825-
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.'
825+
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15m.'
826826
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck
827827
- eval_time: 34m
828828
alertname: KubeDaemonSetRolloutStuck
@@ -878,7 +878,7 @@ tests:
878878
severity: warning
879879
exp_annotations:
880880
summary: "DaemonSet rollout is stuck."
881-
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.'
881+
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15m.'
882882
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck
883883
- eval_time: 34m
884884
alertname: KubeDaemonSetRolloutStuck
@@ -909,7 +909,7 @@ tests:
909909
severity: warning
910910
exp_annotations:
911911
summary: "DaemonSet rollout is stuck."
912-
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.'
912+
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15m.'
913913
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck
914914
- eval_time: 34m
915915
alertname: KubeDaemonSetRolloutStuck
@@ -940,7 +940,7 @@ tests:
940940
severity: warning
941941
exp_annotations:
942942
summary: "DaemonSet rollout is stuck."
943-
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.'
943+
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15m.'
944944
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck
945945
- eval_time: 36m
946946
alertname: KubeDaemonSetRolloutStuck

0 commit comments

Comments
 (0)