From 7b66411c594f6148172445e4ea9a9414b263157f Mon Sep 17 00:00:00 2001
From: Yasser Tahiri <yasserth19@gmail.com>
Date: Thu, 12 Jun 2025 22:54:30 +0100
Subject: [PATCH 1/2] feat(docs): add runbook for Node Disk IO Saturation alert

---
 .../PrometheusNodeDiskIOSaturation.md         | 104 ++++++++++++++++++
 1 file changed, 104 insertions(+)
 create mode 100644 content/runbooks/prometheus/PrometheusNodeDiskIOSaturation.md
diff --git a/content/runbooks/prometheus/PrometheusNodeDiskIOSaturation.md b/content/runbooks/prometheus/PrometheusNodeDiskIOSaturation.md
new file mode 100644
index 0000000..5493a2d
--- /dev/null
+++ b/content/runbooks/prometheus/PrometheusNodeDiskIOSaturation.md
@@ -0,0 +1,104 @@
+---
+title: Node Disk IO Saturation Alert
+weight: 20
+---
+
+# NodeDiskIOSaturation
+
+## Alert Details
+
+- **Alert Name**: NodeDiskIOSaturation
+- **Severity**: Warning
+- **Component**: Node Exporter
+- **Namespace**: monitoring
+
+## Alert Description
+
+This alert fires when the disk IO queue (aqu-sq) is high on a specific device, indicating potential disk saturation. The alert triggers when the queue length has been above 10 for the last 30 minutes.
+
+## Alert Context
+
+The alert is generated by the node-exporter pod running in the monitoring namespace. It monitors the disk IO queue length for all block devices on the node.
+
+## Investigation Steps
+
+1. **Verify Alert Details**
+   - Check the specific device mentioned in the alert (e.g., sdc)
+   - Note the current queue length value
+   - Identify the affected node(s)
+
+2. **Check Node Resources**
+
+   ```bash
+   # Get node status
+   kubectl describe node <node-name>
+   
+   # Check node-exporter logs
+   kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-node-exporter
+   ```
+
+3. **Investigate Disk Performance**
+
+   ```bash
+   # SSH into the affected node
+   ssh <node-ip>
+   
+   # Check IO statistics
+   iostat -x 1
+   
+   # Check disk queue length
+   cat /sys/block/<device>/queue/nr_requests
+   
+   # Check IO wait
+   top
+   ```
+
+4. **Identify High IO Processes**
+
+   ```bash
+   # List processes with high IO
+   iotop
+   
+   # Check IO statistics per process
+   pidstat -d 1
+   ```
+
+## Common Causes
+
+1. High disk I/O from applications
+2. Insufficient disk performance for the workload
+3. Disk hardware issues
+4. Network storage performance issues
+5. Resource contention from other workloads
+
+## Resolution Steps
+
+1. **Short-term Mitigation**
+   - Identify and stop non-critical high IO processes
+   - Consider moving workloads to other nodes
+   - Increase disk queue length if appropriate
+
+2. **Long-term Solutions**
+   - Upgrade disk hardware if consistently hitting limits
+   - Implement IO throttling for problematic workloads
+   - Consider using faster storage solutions
+   - Optimize application IO patterns
+   - Implement proper resource limits and requests
+
+3. **Preventive Measures**
+   - Monitor disk IO patterns
+   - Set up proper resource quotas
+   - Implement IO scheduling policies
+   - Regular performance testing
+
+## Related Alerts
+
+- NodeDiskSpaceFillingUp
+- NodeDiskSpaceAlmostFull
+- NodeDiskSpaceFull
+
+## References
+
+- [Prometheus Node Exporter Documentation](https://github.com/prometheus/node_exporter)
+- [Kubernetes Node Resource Management](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/)
+- [Linux IO Scheduler Documentation](https://www.kernel.org/doc/html/latest/block/iosched.html)

From 1f7cfea56edfb57ebd0a57251d76c40f66391118 Mon Sep 17 00:00:00 2001
From: Yasser Tahiri <yasserth19@gmail.com>
Date: Thu, 12 Jun 2025 22:57:10 +0100
Subject: [PATCH 2/2] :recycle: fix the directory of the document

---
 .../NodeDiskIOSaturation.md}                                      | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename content/runbooks/{prometheus/PrometheusNodeDiskIOSaturation.md => node/NodeDiskIOSaturation.md} (100%)

diff --git a/content/runbooks/prometheus/PrometheusNodeDiskIOSaturation.md b/content/runbooks/node/NodeDiskIOSaturation.md
similarity index 100%
rename from content/runbooks/prometheus/PrometheusNodeDiskIOSaturation.md
rename to content/runbooks/node/NodeDiskIOSaturation.md