Skip to content

Commit dadbf9e

Browse files
committed
Add an alert to CVO that fires when cluster is not upgradeable
Added alert ClusterNotUpgradeable that fires when metric cluster_operator_conditions{name="version"} reports condition Upgradeable as false for 60 minutes or more. Alert reports the reason cluster is not upgradeable and console URL to get more information.
1 parent fc25a6f commit dadbf9e

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

install/0000_90_cluster-version-operator_02_servicemonitor.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,14 @@ spec:
4646
severity: critical
4747
- name: cluster-operators
4848
rules:
49+
- alert: ClusterNotUpgradeable
50+
annotations:
51+
message: One or more cluster operators have been blocking minor version cluster upgrades for at least an hour for reason {{ "{{ with $cluster_operator_conditions := \"cluster_operator_conditions\" | query}}{{range $value := .}}{{if and (eq (label \"name\" $value) \"version\") (eq (label \"condition\" $value) \"Upgradeable\") (eq (label \"endpoint\" $value) \"metrics\") (eq (value $value) 0.0) (ne (len (label \"reason\" $value)) 0) }}{{label \"reason\" $value}}.{{end}}{{end}}{{end}}"}} {{ "{{ with $console_url := \"console_url\" | query }}{{ if ne (len (label \"url\" (first $console_url ) ) ) 0}} For more information refer to {{ label \"url\" (first $console_url ) }}/settings/cluster/.{{ end }}{{ end }}" }}
52+
expr: |
53+
max by (name, condition, endpoint) (cluster_operator_conditions{name="version", condition="Upgradeable", endpoint="metrics"} == 0)
54+
for: 60m
55+
labels:
56+
severity: warning
4957
- alert: ClusterOperatorDown
5058
annotations:
5159
message: Cluster operator {{ "{{ $labels.name }}" }} has not been available for 10 mins. Operator may be down or disabled, cluster will not be kept up to date and upgrades will not be possible.

0 commit comments

Comments
 (0)