-
Notifications
You must be signed in to change notification settings - Fork 629
feat: filter NodeReadiness alerts on uncordoned status #1012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: TheRealNoob <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks like a reasonable change.
Please update this test:
Lines 571 to 586 in af5e898
| - interval: 1m | |
| input_series: | |
| - series: 'kube_node_status_condition{condition="Ready",endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube",pod="kube-state-metrics-b894d84cc-d6htw",service="kube-state-metrics",status="true"}' | |
| values: '1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 1' | |
| alert_rule_test: | |
| - eval_time: 18m | |
| alertname: KubeNodeReadinessFlapping | |
| exp_alerts: | |
| - exp_labels: | |
| cluster: kubernetes | |
| node: minikube | |
| severity: warning | |
| exp_annotations: | |
| summary: "Node readiness status is flapping." | |
| description: 'The readiness status of node minikube has changed 9 times in the last 15 minutes.' | |
| runbook_url: 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping' |
Extra credit if you can create a test for KubeNodeNotReady! 😄
Signed-off-by: TheRealNoob <[email protected]>
|
I think I did this right? I'd really appreciate a second set of eyes as its my first time using the native prometheus unit tests. Appreciate the advice you gave me on Slack, it was super helpful! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I like that you used two simple node series where only one node is expected to have an alert firing.
| - series: 'kube_node_status_condition{condition="Ready",endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube2",pod="kube-state-metrics-b894d84cc-f5e9f",service="kube-state-metrics"}' | ||
| values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0' | ||
| - series: 'kube_node_spec_unschedulable{endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube2",pod="kube-state-metrics-b894d84cc-f5e9f",service="kube-state-metrics"}' | ||
| values: '1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shorthand syntax suggestion that you might like to know, for next time:
| values: '1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1' | |
| values: '1x19' |
Which would translate to:
shorthand for '1+0x19', series starts at 1, then 19 further samples incrementing by 0.
Hello,
First time contributing to this repo and writing jsonnet so I welcome any feedback on the syntax that I've used here.
In my personal experience at #dayjob we get many false alerts on nodes which are undergoing maintenance. I believe filtering out cordoned nodes from these alerts is something many people would benefit from. Thoughts?
Thank you