feat: filter NodeReadiness alerts on uncordoned status #1012

TheRealNoob · 2025-01-17T18:34:05Z

Hello,

First time contributing to this repo and writing jsonnet so I welcome any feedback on the syntax that I've used here.

In my personal experience at #dayjob we get many false alerts on nodes which are undergoing maintenance. I believe filtering out cordoned nodes from these alerts is something many people would benefit from. Thoughts?

Thank you

Signed-off-by: TheRealNoob <[email protected]>

skl

Thanks, this looks like a reasonable change.

Please update this test:

kubernetes-mixin/tests.yaml

Lines 571 to 586 in af5e898

    
           - interval: 1m 
        
             input_series: 
        
             - series: 'kube_node_status_condition{condition="Ready",endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube",pod="kube-state-metrics-b894d84cc-d6htw",service="kube-state-metrics",status="true"}' 
        
               values: '1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 1' 
        
             alert_rule_test: 
        
             - eval_time: 18m 
        
               alertname: KubeNodeReadinessFlapping 
        
               exp_alerts: 
        
               - exp_labels: 
        
                   cluster: kubernetes 
        
                   node: minikube 
        
                   severity: warning 
        
                 exp_annotations: 
        
                   summary: "Node readiness status is flapping." 
        
                   description: 'The readiness status of node minikube has changed 9 times in the last 15 minutes.' 
        
                   runbook_url: 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping'

Extra credit if you can create a test for KubeNodeNotReady! 😄

Signed-off-by: TheRealNoob <[email protected]>

TheRealNoob · 2025-01-22T21:29:21Z

I think I did this right? I'd really appreciate a second set of eyes as its my first time using the native prometheus unit tests. Appreciate the advice you gave me on Slack, it was super helpful!

skl

lgtm, I like that you used two simple node series where only one node is expected to have an alert firing.

skl · 2025-01-23T11:42:17Z

tests.yaml

+  - series: 'kube_node_status_condition{condition="Ready",endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube2",pod="kube-state-metrics-b894d84cc-f5e9f",service="kube-state-metrics"}'
+    values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
+  - series: 'kube_node_spec_unschedulable{endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube2",pod="kube-state-metrics-b894d84cc-f5e9f",service="kube-state-metrics"}'
+    values: '1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1'


Shorthand syntax suggestion that you might like to know, for next time:

Suggested change

values: '1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1'

values: '1x19'

docs

Which would translate to:

shorthand for '1+0x19', series starts at 1, then 19 further samples incrementing by 0.

tests.yaml

feat: filter NodeReadiness alerts on uncordoned status

e88580b

Signed-off-by: TheRealNoob <[email protected]>

TheRealNoob requested review from povilasv and skl as code owners January 17, 2025 18:34

skl reviewed Jan 20, 2025

View reviewed changes

add tests

30d35ea

Signed-off-by: TheRealNoob <[email protected]>

skl approved these changes Jan 23, 2025

View reviewed changes

skl reviewed Jan 23, 2025

View reviewed changes

tests.yaml Show resolved Hide resolved

Update tests.yaml

1688037

skl reviewed Jan 23, 2025

View reviewed changes

tests.yaml Outdated Show resolved Hide resolved

Update tests.yaml

d2f9d0a

skl merged commit 9ceec88 into kubernetes-monitoring:master Jan 23, 2025
9 checks passed

TheRealNoob deleted the filter-coroned branch January 23, 2025 17:47

simonpasquier mentioned this pull request Aug 20, 2025

OCPBUGS-34568,OCPBUGS-35095,OCPBUGS-60689,OCPBUGS-60691,OCPBUGS-60692: non-HA alert cases openshift/cluster-monitoring-operator#2630

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: filter NodeReadiness alerts on uncordoned status #1012

feat: filter NodeReadiness alerts on uncordoned status #1012

Uh oh!

TheRealNoob commented Jan 17, 2025 •

edited

Loading

Uh oh!

skl left a comment

Uh oh!

TheRealNoob commented Jan 22, 2025

Uh oh!

skl left a comment

Uh oh!

skl Jan 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- interval: 1m
	input_series:
	- series: 'kube_node_status_condition{condition="Ready",endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube",pod="kube-state-metrics-b894d84cc-d6htw",service="kube-state-metrics",status="true"}'
	values: '1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 1'
	alert_rule_test:
	- eval_time: 18m
	alertname: KubeNodeReadinessFlapping
	exp_alerts:
	- exp_labels:
	cluster: kubernetes
	node: minikube
	severity: warning
	exp_annotations:
	summary: "Node readiness status is flapping."
	description: 'The readiness status of node minikube has changed 9 times in the last 15 minutes.'
	runbook_url: 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping'

	values: '1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1'
	values: '1x19'

feat: filter NodeReadiness alerts on uncordoned status #1012

feat: filter NodeReadiness alerts on uncordoned status #1012

Uh oh!

Conversation

TheRealNoob commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skl left a comment

Choose a reason for hiding this comment

Uh oh!

TheRealNoob commented Jan 22, 2025

Uh oh!

skl left a comment

Choose a reason for hiding this comment

Uh oh!

skl Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TheRealNoob commented Jan 17, 2025 •

edited

Loading