|
42 | 42 | - alert: etcdHighFsyncDurations |
43 | 43 | annotations: |
44 | 44 | description: 'etcd cluster "{{ $labels.job }}": 99th percentile fsync durations are {{ $value }}s on etcd instance {{ $labels.instance }}.' |
| 45 | + runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-etcd-operator/etcdHighFsyncDurations.md |
45 | 46 | summary: etcd cluster 99th percentile fsync durations are too high. |
46 | 47 | expr: | |
47 | 48 | histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{job=~".*etcd.*"}[5m])) |
|
84 | 85 | - alert: etcdDatabaseQuotaLowSpace |
85 | 86 | annotations: |
86 | 87 | description: 'etcd cluster "{{ $labels.job }}": database size is 65% of the defined quota on etcd instance {{ $labels.instance }}, please defrag or increase the quota as the writes to etcd will be disabled when it is full.' |
| 88 | + runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-etcd-operator/etcdDatabaseQuotaLowSpace.md |
87 | 89 | summary: etcd cluster database is using >= 65% of the defined quota. |
88 | 90 | expr: (last_over_time(etcd_mvcc_db_total_size_in_bytes{job=~".*etcd.*"}[5m]) / last_over_time(etcd_server_quota_backend_bytes{job=~".*etcd.*"}[5m]))*100 > 65 |
89 | 91 | for: 10m |
|
92 | 94 | - alert: etcdDatabaseQuotaLowSpace |
93 | 95 | annotations: |
94 | 96 | description: 'etcd cluster "{{ $labels.job }}": database size is 75% of the defined quota on etcd instance {{ $labels.instance }}, please defrag or increase the quota as the writes to etcd will be disabled when it is full.' |
| 97 | + runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-etcd-operator/etcdDatabaseQuotaLowSpace.md |
95 | 98 | summary: etcd cluster database is using >= 75% of the defined quota. |
96 | 99 | expr: (last_over_time(etcd_mvcc_db_total_size_in_bytes{job=~".*etcd.*"}[5m]) / last_over_time(etcd_server_quota_backend_bytes{job=~".*etcd.*"}[5m]))*100 > 75 |
97 | 100 | for: 10m |
@@ -132,6 +135,7 @@ spec: |
132 | 135 | - alert: etcdHighNumberOfFailedGRPCRequests |
133 | 136 | annotations: |
134 | 137 | description: 'etcd cluster "{{ $labels.job }}": {{ $value }}% of requests for {{ $labels.grpc_method }} failed on etcd instance {{ $labels.instance }}.' |
| 138 | + runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-etcd-operator/etcdHighNumberOfFailedGRPCRequests.md |
135 | 139 | summary: etcd cluster has high number of failed grpc requests. |
136 | 140 | expr: | |
137 | 141 | (sum(rate(grpc_server_handled_total{job="etcd", grpc_code=~"Unknown|FailedPrecondition|ResourceExhausted|Internal|Unavailable|DataLoss|DeadlineExceeded"}[5m])) without (grpc_type, grpc_code) |
|
0 commit comments