Skip to content

Commit f316348

Browse files
Merge pull request #275792 from bwren/ci-metrics
Alert mapping between legacy and Prometheus alerts
2 parents 20fbc12 + 9bb2e21 commit f316348

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

articles/azure-monitor/containers/kubernetes-metric-alerts.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,25 @@ If you already enabled these legacy alert rules, you should disable them and ena
195195
2. Change the status for each alert rule to **Disabled**.
196196

197197

198+
199+
### Legacy alert mapping
200+
The following table maps each of the legacy Container insights metric alerts to its equivalent recommended Prometheus metric alerts.
201+
202+
| Custom metric recommended alert | Equivalent Prometheus/Platform metric recommended alert | Condition |
203+
|:---|:---|:---|
204+
| Completed job count | KubeJobStale (Pod level alerts) | At least one Job instance did not complete successfully for the last 6 hours. |
205+
| Container CPU % | KubeContainerAverageCPUHigh (Pod level alerts) | The average CPU usage per container exceeds 95% for the last 5 minutes. |
206+
| Container working set memory % | KubeContainerAverageMemoryHigh (Pod level alerts) | The average memory usage per container exceeds 95% for the last 5 minutes. |
207+
| Failed Pod counts | KubePodFailedState (Pod level alerts) | One or more pods is in a failed state for the last 5 minutes. |
208+
| Node CPU % | Node cpu percentage is greater than 95% (Platform metric) | The node CPU percentage is greater than 95% for the last 5 minutes. |
209+
| Node Disk Usage % | N/A | Average disk usage for a node is greater than 80%. |
210+
| Node NotReady status | KubeNodeUnreachable (Node level alerts) | A node has been unreachable for the last 15 minutes. |
211+
| Node working set memory % | Node memory working set percentage is greater than 100% | The node memory working set percentage is greater than 100% for the last 5 minutes. |
212+
| OOM Killed Containers | KubeContainerOOMKilledCount (Cluster level alerts) | One or more containers within pods have been killed due to out-of-memory (OOM) events for the last 5 minutes. |
213+
| Persistent Volume Usage % | KubePVUsageHigh (Pod level alerts) | The average usage of Persistent Volumes (PVs) on pod exceeds 80% for the last 15 minutes. |
214+
| Pods ready % | KubePodReadyStateLow (Pod level alerts) | The percentage of pods in a ready state falls below 80% for any deployment or daemonset in the Kubernetes cluster for the last 5 minutes. |
215+
| Restarting container count | KubePodContainerRestart (Pod level alerts) | One or more containers within pods in the Kubernetes cluster have been restarted at least once within the last hour. |
216+
198217
## Next steps
199218

200219
- Read about the [different alert rule types in Azure Monitor](../alerts/alerts-types.md).

0 commit comments

Comments
 (0)