Skip to content

Commit 0818107

Browse files
feat: traefik alerts (#1460)
* feat: traefik alerts Adds two basic Traefik alerts. One for config reloads failing and the other for TLS certificate expiry. * docs: update README for traefik * chore: make fmt * ref: make everything configurable * ref: only have warning alert fire if above crit * docs: mention config vars * ref: match convention of config vars * docs: match new var names * fix: add summary * fix: template config reload with environment label * fix: better grouping for template Co-authored-by: v-zhuravlev <[email protected]> * fix: make all * ref: remove config comments Co-authored-by: v-zhuravlev <[email protected]> * ref: remove environment label Co-authored-by: v-zhuravlev <[email protected]> * fix: make all * fix: make fmt --------- Co-authored-by: v-zhuravlev <[email protected]>
1 parent cd4dd9a commit 0818107

File tree

5 files changed

+179
-7
lines changed

5 files changed

+179
-7
lines changed

traefik-mixin/README.md

Lines changed: 59 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,63 @@
1-
The Traefik mixin is a set of configurable, reusable, and extensible dashboards based on the metrics exported by Traefik itself. It also creates suitable dashboard descriptions for Grafana.
1+
# traefik-mixin
2+
3+
The Traefik mixin is a set of configurable, reusable, and extensible dashboards based on the metrics exported by Traefik itself. It also creates suitable dashboard descriptions for Grafana. Lastly, some alerts are also included.
24

35
To use them, you need to have mixtool and jsonnetfmt installed. If you have a working Go development environment, it's easiest to run the following:
46

5-
$ go get github.com/monitoring-mixins/mixtool/cmd/mixtool
6-
$ go get github.com/google/go-jsonnet/cmd/jsonnetfmt
7-
You can then build the Prometheus rules files alerts.yaml and rules.yaml and a directory dashboard_out with the JSON dashboard files for Grafana:
7+
```shell
8+
go get github.com/monitoring-mixins/mixtool/cmd/mixtool
9+
go get github.com/google/go-jsonnet/cmd/jsonnetfmt
10+
```
11+
12+
You can then build the Prometheus rules files and dashboards for Grafana:
13+
14+
```shell
15+
make build
16+
```
17+
18+
This will generate:
19+
20+
- Prometheus alerts in `prometheus_rules_out/prometheus_alerts.yaml`
21+
- Prometheus rules in `prometheus_rules_out/prometheus_rules.yaml` (if you have rules defined)
22+
- Grafana dashboards in `dashboards_out/`
23+
24+
## Included Alerts
25+
26+
The following Prometheus alerts are included:
27+
28+
- **TraefikConfigReloadFailuresIncreasing**: Fires if Traefik is failing to reload its config.
29+
- **TraefikTLSCertificatesExpiring**: Fires if Traefik is serving certificates that will expire very soon (critical, threshold configurable).
30+
- **TraefikTLSCertificatesExpiringSoon**: Fires if Traefik is serving certificates that will expire soon (warning, threshold configurable, only fires if the expiry is less than the warning threshold but greater than the critical threshold).
31+
32+
## Configuration
33+
34+
You can configure alert thresholds, selectors, and labels in `config.libsonnet`:
35+
36+
```jsonnet
37+
{
38+
_config+:: {
39+
traefik_tls_expiry_days_critical: 7, // critical threshold (days)
40+
traefik_tls_expiry_days_warning: 14, // warning threshold (days)
41+
filteringSelector: '', // optional metric label selector for all alerts
42+
// Example:
43+
// filteringSelector: "component=\"traefik\",environment=\"production\"",
44+
groupLabels: 'job, environment',
45+
instanceLabels: 'instance',
46+
47+
alertLabels: {}, // optional alert labels
48+
// Example:
49+
// alertLabels: {
50+
// environment: 'production',
51+
// component: 'traefik',
52+
// },
53+
alertAnnotations: {}, // optional alert annotations
54+
// Example:
55+
// alertAnnotations: {
56+
// runbook: 'https://runbooks.example.com/traefik-tls',
57+
// grafana: 'https://grafana.example.com/d/traefik',
58+
// },
59+
},
60+
}
61+
```
862

9-
$ make build
10-
For more advanced uses of mixins, see https://github.com/monitoring-mixins/docs.
63+
For more advanced uses of mixins, see [monitoring-mixins/docs](https://github.com/monitoring-mixins/docs).

traefik-mixin/alerts/alerts.libsonnet

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
{
2+
prometheusAlerts+:: {
3+
groups+: [
4+
{
5+
name: 'traefik',
6+
rules: [
7+
{
8+
alert: 'TraefikConfigReloadFailuresIncreasing',
9+
expr: |||
10+
sum by (%(groupLabels)s) (rate(traefik_config_reloads_failure_total{%(filteringSelector)s}[5m])) > 0
11+
||| % $._config,
12+
'for': '5m',
13+
labels: {
14+
severity: 'critical',
15+
} + std.get($._config, 'alertLabels', {}),
16+
annotations: {
17+
summary: 'Traefik is failing to reload its configuration.',
18+
description: |||
19+
Traefik is failing to reload its config in {{ $labels.%(firstGroupLabel)s }}.
20+
||| % { firstGroupLabel: std.split($._config.groupLabels, ',')[0] },
21+
} + std.get($._config, 'alertAnnotations', {}),
22+
},
23+
{
24+
alert: 'TraefikTLSCertificatesExpiring',
25+
expr: |||
26+
max by (%(instanceLabels)s, sans) ((last_over_time(traefik_tls_certs_not_after{%(filteringSelector)s}[5m]) - time()) / 86400) < %(traefik_tls_expiry_days_critical)s
27+
||| % $._config,
28+
'for': '5m',
29+
labels: {
30+
severity: 'critical',
31+
} + std.get($._config, 'alertLabels', {}),
32+
annotations: {
33+
summary: 'A Traefik-served TLS certificate will expire very soon.',
34+
description: |||
35+
The minimum number of days until a Traefik-served certificate expires is {{ printf "%%.0f" $value }} days on {{ $labels.sans }} which is below the critical threshold of %(traefik_tls_expiry_days_critical)s.
36+
||| % $._config,
37+
} + std.get($._config, 'alertAnnotations', {}),
38+
},
39+
{
40+
alert: 'TraefikTLSCertificatesExpiringSoon',
41+
expr: |||
42+
max by (%(instanceLabels)s, sans) ((last_over_time(traefik_tls_certs_not_after{%(filteringSelector)s}[5m]) - time()) / 86400) < %(traefik_tls_expiry_days_warning)s > %(traefik_tls_expiry_days_critical)s
43+
||| % $._config,
44+
'for': '5m',
45+
labels: {
46+
severity: 'warning',
47+
} + std.get($._config, 'alertLabels', {}),
48+
annotations: {
49+
summary: 'A Traefik-served TLS certificate will expire soon.',
50+
description: |||
51+
The minimum number of days until a Traefik-served certificate expires is {{ printf "%%.0f" $value }} days on {{ $labels.sans }} which is less than %(traefik_tls_expiry_days_warning)s but greater than %(traefik_tls_expiry_days_critical)s.
52+
||| % $._config,
53+
} + std.get($._config, 'alertAnnotations', {}),
54+
},
55+
],
56+
},
57+
],
58+
},
59+
}

traefik-mixin/config.libsonnet

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{
2+
_config+:: {
3+
// alerts thresholds
4+
traefik_tls_expiry_days_critical: 7,
5+
traefik_tls_expiry_days_warning: 14,
6+
filteringSelector: '',
7+
// Example:
8+
// filteringSelector: "component=\"traefik\",environment=\"production\"",
9+
// for config reload alert
10+
groupLabels: 'job',
11+
// for TLS alerts
12+
instanceLabels: 'instance',
13+
alertLabels: {},
14+
// Example:
15+
// alertLabels: {
16+
// environment: 'production',
17+
// component: 'traefik',
18+
// },
19+
alertAnnotations: {},
20+
// Example:
21+
// alertAnnotations: {
22+
// runbook: 'https://runbooks.example.com/traefik-tls',
23+
// grafana: 'https://grafana.example.com/d/traefik',
24+
// },
25+
},
26+
}

traefik-mixin/mixin.libsonnet

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@
22
grafanaDashboards+:: {
33
'traefikdash.json': (import 'dashboards/traefikdash.json'),
44
},
5-
}
5+
} + (import 'alerts/alerts.libsonnet') +
6+
(import 'config.libsonnet')

traefik-mixin/prometheus_rules_out/prometheus_alerts.yaml

Lines changed: 33 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)