Skip to content

Commit 5cf8473

Browse files
committed
Add alertmanager rules for Docker
Docker has a builtin prometheus exporter that we currently don't have enabled. This change adds alerts for stopped/paused containers and failed healthchecks. This patch requires changes to docker's configuration to export the metrics, and prometheus to consume them. This means that Kayobe and Kolla-Ansible should both be updated to their latest versions.
1 parent fe96cb4 commit 5cf8473

File tree

3 files changed

+40
-0
lines changed

3 files changed

+40
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
---
2+
# Address for prometheus metrics endpoint
3+
docker_metrics_addr: "{{ internal_net_name | net_ip + ':9323'}}"
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
2+
{% raw %}
3+
4+
groups:
5+
- name: Docker
6+
rules:
7+
8+
- alert: DockerContainerStopped
9+
expr: 'engine_daemon_container_states_containers{state="stopped"} > 0'
10+
labels:
11+
severity: warning
12+
annotations:
13+
summary: "Containers not running (instance {{ $labels.instance }})"
14+
description: "One or more container are stopped"
15+
16+
- alert: DockerContainerPaused
17+
expr: 'engine_daemon_container_states_containers{state="paused"} > 0'
18+
labels:
19+
severity: warning
20+
annotations:
21+
summary: "Containers not running (instance {{ $labels.instance }})"
22+
description: "One or more container are stopped"
23+
24+
- alert: DockerContainerHealthCheckFail
25+
expr: rate(engine_daemon_health_checks_failed_total[1m]) > 1
26+
labels:
27+
severity: warning
28+
annotations:
29+
summary: "Containers health check failed (instance {{ $labels.instance }})"
30+
description: "One or more container health checks failed"
31+
32+
{% endraw %}
33+
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
features:
3+
- |
4+
Added new default alerting rules for containers being unhealthy or stopped.

0 commit comments

Comments
 (0)