Skip to content

Conversation

@TheRealNoob
Copy link
Contributor

@TheRealNoob TheRealNoob commented Jan 17, 2025

Hello,

First time contributing to this repo and writing jsonnet so I welcome any feedback on the syntax that I've used here.

In my personal experience at #dayjob we get many false alerts on nodes which are undergoing maintenance. I believe filtering out cordoned nodes from these alerts is something many people would benefit from. Thoughts?

Thank you

Copy link
Collaborator

@skl skl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a reasonable change.

Please update this test:

kubernetes-mixin/tests.yaml

Lines 571 to 586 in af5e898

- interval: 1m
input_series:
- series: 'kube_node_status_condition{condition="Ready",endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube",pod="kube-state-metrics-b894d84cc-d6htw",service="kube-state-metrics",status="true"}'
values: '1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 1'
alert_rule_test:
- eval_time: 18m
alertname: KubeNodeReadinessFlapping
exp_alerts:
- exp_labels:
cluster: kubernetes
node: minikube
severity: warning
exp_annotations:
summary: "Node readiness status is flapping."
description: 'The readiness status of node minikube has changed 9 times in the last 15 minutes.'
runbook_url: 'https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubenodereadinessflapping'

Extra credit if you can create a test for KubeNodeNotReady! 😄

Signed-off-by: TheRealNoob <[email protected]>
@TheRealNoob
Copy link
Contributor Author

I think I did this right? I'd really appreciate a second set of eyes as its my first time using the native prometheus unit tests. Appreciate the advice you gave me on Slack, it was super helpful!

Copy link
Collaborator

@skl skl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I like that you used two simple node series where only one node is expected to have an alert firing.

- series: 'kube_node_status_condition{condition="Ready",endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube2",pod="kube-state-metrics-b894d84cc-f5e9f",service="kube-state-metrics"}'
values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'
- series: 'kube_node_spec_unschedulable{endpoint="https-main",cluster="kubernetes",instance="10.0.2.15:10250",job="kube-state-metrics",namespace="monitoring",node="minikube2",pod="kube-state-metrics-b894d84cc-f5e9f",service="kube-state-metrics"}'
values: '1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorthand syntax suggestion that you might like to know, for next time:

Suggested change
values: '1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1'
values: '1x19'

docs

Which would translate to:

shorthand for '1+0x19', series starts at 1, then 19 further samples incrementing by 0.

@skl skl merged commit 9ceec88 into kubernetes-monitoring:master Jan 23, 2025
9 checks passed
@TheRealNoob TheRealNoob deleted the filter-coroned branch January 23, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants