Skip to content

Commit 6ca2eb6

Browse files
jovialdougszumski
andauthored
Adds alerts for software raid failures (#935)
* Adds alerts for software raid failures See: https://github.com/prometheus/node_exporter/blob/master/docs/node-mixin/alerts/alerts.libsonnet * Fix typo in release notes --------- Co-authored-by: Doug Szumski <[email protected]>
1 parent b4c00af commit 6ca2eb6

File tree

2 files changed

+24
-0
lines changed

2 files changed

+24
-0
lines changed

etc/kayobe/kolla/config/prometheus/system.rules

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,4 +104,24 @@ groups:
104104
annotations:
105105
summary: Host conntrack limit (instance {{ $labels.instance }})
106106
description: "The number of conntrack is approaching limit"
107+
108+
- alert: NodeRAIDDegraded
109+
expr: |
110+
node_md_disks_required{job="node",device!=""} - ignoring (state) (node_md_disks{state="active",job="node",device!=""}) > 0
111+
for: "15m"
112+
labels:
113+
severity: critical
114+
annotations:
115+
description: "RAID array '{{ $labels.device }}' at {{ $labels.instance }} is in degraded state due to one or more disks failures. Number of spare drives is insufficient to fix issue automatically."
116+
summary: "RAID Array is degraded."
117+
118+
- alert: NodeRAIDDiskFailure
119+
expr: |
120+
node_md_disks{state="failed",job="node",device!=""} > 0
121+
labels:
122+
severity: warning
123+
annotations:
124+
description: "At least one device in RAID array at {{ $labels.instance }} failed. Array '{{ $labels.device }}' needs attention and possibly a disk swap."
125+
summary: "Failed device in RAID array."
126+
107127
{% endraw %}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
features:
3+
- |
4+
Adds alerts for software raid failures.

0 commit comments

Comments
 (0)