Skip to content

Commit 017ac8e

Browse files
committed
refactor(disk-io): modernize code
1 parent 9636c10 commit 017ac8e

File tree

4 files changed

+249
-139
lines changed

4 files changed

+249
-139
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ Icinga Director:
6565
Monitoring Plugins:
6666

6767
* cpu-usage: non-blocking behaviour (interval=None + manual deltas via SQLite DB) so we get both accuracy and faster runtime
68+
* disk-io: modernize code
6869
* gitlab-health: increase timeout from 3 to 8 secs
6970
* gitlab-liveness: increase timeout from 3 to 8 secs
7071
* gitlab-readiness: increase timeout from 3 to 8 secs

check-plugins/disk-io/README.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ dm-1 ! 10.0MiB ! 4.7KiB ! 4.0KiB ! 2.0KiB ! 6.8KiB ! 8.7KiB
2020
...
2121
```
2222

23-
The first line always shows the disk with the currently highest bandwidth (here `dm-0`).
23+
The first line always shows the disk with the currently highest bandwidth usage (here `dm-0`).
2424

2525
The table columns mean:
2626

@@ -54,9 +54,19 @@ Hints:
5454
usage: disk-io [-h] [-V] [--always-ok] [--count COUNT] [--critical CRIT]
5555
[--match MATCH] [--top TOP] [--warning WARN]
5656
57-
Checks disk I/O. If the bandwidth usage of a disk is above the specified
58-
threshold (as a percentage of the maximum bandwidth measured) for a certain
59-
period of time, an alarm is triggered.
57+
Checks disk I/O bandwidth over time and alerts on sustained saturation, not
58+
short spikes. The check records per-disk read/write counters and then derives
59+
current (R1/W1) and period averages (R{COUNT}/W{COUNT}). It compares the
60+
period’s total bandwidth against the maximum ever observed for that disk
61+
(RWmax). WARN/CRIT trigger if the period average exceeds the configured
62+
percentage of RWmax for COUNT consecutive runs. Perfdata is emitted for each
63+
disk (busy_time, read_bytes, read_time, write_bytes, write_time) so you can
64+
graph trends. On Linux the check automatically focuses on “real” block devices
65+
with mountpoints; on Windows it uses psutil’s disk counters. Optionally, --top
66+
lists the processes that generated the most I/O traffic (read/write totals) to
67+
help identify offenders. This check is cross-platform and works on Linux,
68+
Windows, and all psutil-supported systems. The check stores its short trend
69+
state locally in an SQLite DB to evaluate sustained load across runs.
6070
6171
options:
6272
-h, --help show this help message and exit
@@ -77,7 +87,8 @@ options:
7787
characters that satisfy the condition inside it, zero or
7888
more times. Default:
7989
--top TOP List x "Top processes that generated the most I/O traffic".
80-
Default: 5
90+
Use `--top=0` to disable this feature. Default: 5 on Linux,
91+
0 on Windows
8192
--warning WARN Threshold for disk bandwidth saturation (over the last
8293
`--count` measurements) as a percentage of the maximum
8394
bandwidth the disk can support. Default: >= 80
@@ -132,15 +143,15 @@ Top 5 processes that generate the most I/O traffic (r/w):
132143

133144
## Perfdata / Metrics
134145

135-
Per (matched) disk, where \<disk\> is the block device name:
146+
Per (matched) disk, where <disk\> is the block device name:
136147

137148
| Name | Type | Description |
138149
|----|----|----|
139-
| \<disk\>\_busy_time | Continous Counter | Time spent doing actual I/Os (in milliseconds). |
140-
| \<disk\>\_read_bytes | Continous Counter | Number of bytes read. |
141-
| \<disk\>\_read_time | Continous Counter | Time spent reading from disk (in milliseconds). |
142-
| \<disk\>\_write_bytes | Continous Counter | Number of bytes written. |
143-
| \<disk\>\_write_time | Continous Counter | Time spent writing to disk (in milliseconds). |
150+
| <disk\>\_busy_time | Continous Counter | Time spent doing actual I/Os (in milliseconds). |
151+
| <disk\>\_read_bytes | Continous Counter | Number of bytes read. |
152+
| <disk\>\_read_time | Continous Counter | Time spent reading from disk (in milliseconds). |
153+
| <disk\>\_write_bytes | Continous Counter | Number of bytes written. |
154+
| <disk\>\_write_time | Continous Counter | Time spent writing to disk (in milliseconds). |
144155

145156

146157
## Troubleshooting

0 commit comments

Comments
 (0)