Skip to content

Commit 201a2cc

Browse files
authored
Add Cloudflare mixin (#1083)
* init * first pass on docs * readme update * dashboards & alerts added * rename folder * rename folder * dans feedback * bomins feedback 1 * linter changes * vitalys feedback * vitalys feedback 2 * screenshot urls * nit typo
1 parent 280304a commit 201a2cc

11 files changed

+2147
-0
lines changed

cloudflare-mixin/.lint

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
exclusions:
2+
template-job-rule:
3+
reason: "Prometheus datasource variable is being named as prometheus_datasource now while linter expects 'datasource'"
4+
panel-datasource-rule:
5+
reason: "Loki datasource variable is being named as loki_datasource now while linter expects 'datasource'"
6+
template-datasource-rule:
7+
reason: "Based on new convention we are using variable names prometheus_datasource and loki_datasource where as linter expects 'datasource'"
8+
template-instance-rule:
9+
reason: "Based on new convention we are using variable names prometheus_datasource and loki_datasource where as linter expects 'datasource'"
10+
template-label-promql-rule:
11+
reason: "Cloudflare GeoMap overview dashboard uses templated queries to allow user to select which query is displayed on GeoMap using variable selector."
12+
panel-units-rule:
13+
reason: "Custom units are used for better user experience in these panels"
14+

cloudflare-mixin/Makefile

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
JSONNET_FMT := jsonnetfmt -n 2 --max-blank-lines 1 --string-style s --comment-style s
2+
3+
.PHONY: all
4+
all: build dashboards_out prometheus_alerts.yaml
5+
6+
vendor: jsonnetfile.json
7+
jb install
8+
9+
.PHONY: build
10+
build: vendor
11+
12+
.PHONY: fmt
13+
fmt:
14+
find . -name 'vendor' -prune -o -name '*.libsonnet' -print -o -name '*.jsonnet' -print | \
15+
xargs -n 1 -- $(JSONNET_FMT) -i
16+
17+
.PHONY: lint
18+
lint: build
19+
find . -name 'vendor' -prune -o -name '*.libsonnet' -print -o -name '*.jsonnet' -print | \
20+
while read f; do \
21+
$(JSONNET_FMT) "$$f" | diff -u "$$f" -; \
22+
done
23+
mixtool lint mixin.libsonnet
24+
25+
dashboards_out: mixin.libsonnet config.libsonnet $(wildcard dashboards/*)
26+
@mkdir -p dashboards_out
27+
mixtool generate dashboards mixin.libsonnet -d dashboards_out
28+
29+
prometheus_alerts.yaml: mixin.libsonnet alerts/*.libsonnet
30+
mixtool generate alerts mixin.libsonnet -a prometheus_alerts.yaml
31+
32+
.PHONY: clean
33+
clean:
34+
rm -rf dashboards_out prometheus_alerts.yaml
35+

cloudflare-mixin/README.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Cloudflare mixin
2+
The Cloudflare mixin is a set of configurable Grafana dashboards and alerts.
3+
4+
The metrics analyzed here are a part of `Cloudflare Analytics - account and zone analytics` found [here](https://developers.cloudflare.com/analytics/account-and-zone-analytics/). Pool and worker metrics will not appear until your Cloudflare instance has been configured to utilize them.
5+
6+
The Cloudflare mixin contains the following dashboards:
7+
8+
- Cloudflare zone overview
9+
- Cloudflare GeoMap overview
10+
- Cloudflare worker overview
11+
12+
and the following alerts:
13+
14+
- CloudflareHighThreatCount
15+
- CloudflareHighRequestRate
16+
- CloudflareHighHTTPErrorCodes
17+
- CloudflareUnhealthyPools
18+
- CloudflareMetricsDown
19+
20+
## Cloudflare zone overview
21+
The Cloudflare zone overview dashboard provides a detailed look into the performance of the zones in your Cloudflare account. Metrics analyzed include requests, cached requests, various bandwidth numbers, page views, request status, colocations, and pool status.
22+
23+
![First screenshot of Cloudflare zone overview dashboard](https://storage.googleapis.com/grafanalabs-integration-assets/cloudflare/screenshots/cloudflare-zone-overview-1.png)
24+
![Second screenshot of Cloudflare zone overview dashboard](https://storage.googleapis.com/grafanalabs-integration-assets/cloudflare/screenshots/cloudflare-zone-overview-2.png)
25+
26+
## Cloudflare Geomap overview
27+
The Cloudflare Geomap overview dashboard utilizes a GeoMap panel to visualize specific metrics on a global map. The metrics utilized here are requests, bandwidth, threats, non-cached requests, and edge requests.
28+
29+
![Screenshot of Cloudflare Geomap overview dashboard](https://storage.googleapis.com/grafanalabs-integration-assets/cloudflare/screenshots/cloudflare-geomap-overview.png)
30+
31+
## Cloudflare worker overview
32+
The Cloudflare worker overview dashboard provides a look into Cloudflare Worker performance on a per script basis. Metrics include CPU time quantiles, script duration quantiles, requests, and errors.
33+
34+
![Screenshot of Cloudflare worker overview dashboard](https://storage.googleapis.com/grafanalabs-integration-assets/cloudflare/screenshots/cloudflare-worker-overview.png)
35+
36+
## Alerts overview
37+
38+
- CloudflareHighThreatCount: There are detected threats targeting the zone.
39+
- CloudflareHighRequestRate: A high spike in requests is occurring which may indicate an attack or unexpected load.
40+
- CloudflareHighHTTPErrorCodes: A high number of 4xx or 5xx HTTP status codes are occurring.
41+
- CloudflareUnhealthyPools: There are unhealthy pools.
42+
- CloudflareMetricsDown: Cloudflare metrics are down.
43+
44+
Be sure to set `alertsMetricsDownJobName` to match your environment if using a different `Job` label than the default.
45+
46+
Default thresholds can be configured in `config.libsonnet`.
47+
```js
48+
{
49+
_config+:: {
50+
dashboardTags: ['cloudflare-mixin'],
51+
dashboardPeriod: 'now-30m',
52+
dashboardTimezone: 'default',
53+
dashboardRefresh: '1m',
54+
55+
// CloudflareMetricsDown alert filter variable
56+
alertsMetricsDownJobName: 'integrations/cloudflare',
57+
58+
// alerts thresholds
59+
alertsHighThreatCount: 3, // count
60+
alertsHighRequestRate: 150, // percentage
61+
alertsHighHTTPErrorCodeCount: 100, // count
62+
},
63+
}
64+
```
65+
66+
## Install tools
67+
68+
```bash
69+
go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest
70+
go install github.com/monitoring-mixins/mixtool/cmd/mixtool@latest
71+
```
72+
73+
For linting and formatting, you would also need `jsonnetfmt` installed. If you
74+
have a working Go development environment, it's easiest to run the following:
75+
76+
```bash
77+
go install github.com/google/go-jsonnet/cmd/jsonnetfmt@latest
78+
```
79+
80+
The files in `dashboards_out` need to be imported
81+
into your Grafana server. The exact details will be depending on your environment.
82+
83+
`prometheus_alerts.yaml` needs to be imported into Prometheus.
84+
85+
## Generate dashboards and alerts
86+
87+
Edit `config.libsonnet` if required and then build JSON dashboard files for Grafana:
88+
89+
```bash
90+
make
91+
```
92+
93+
For more advanced uses of mixins, see
94+
https://github.com/monitoring-mixins/docs.
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
{
2+
prometheusAlerts+:: {
3+
groups+: [
4+
{
5+
name: 'cloudflare-alerts',
6+
rules: [
7+
{
8+
alert: 'CloudflareHighThreatCount',
9+
expr: |||
10+
sum without (instance) (increase(cloudflare_zone_threats_total[5m])) > %(alertsHighThreatCount)s
11+
||| % $._config,
12+
'for': '5m',
13+
labels: {
14+
severity: 'critical',
15+
},
16+
annotations: {
17+
summary: 'There are detected threats targeting the zone.',
18+
description:
19+
(
20+
'The number of detected threats targeting the zone {{$labels.zone}} is {{ printf "%%.0f" $value }} which is greater than the threshold of %(alertsHighThreatCount)s.'
21+
) % $._config,
22+
},
23+
},
24+
{
25+
alert: 'CloudflareHighRequestRate',
26+
expr: |||
27+
sum without (instance) (100 * (rate(cloudflare_zone_requests_total[10m]) / clamp_min(rate(cloudflare_zone_requests_total[50m] offset 10m), 1))) > %(alertsHighRequestRate)s
28+
||| % $._config,
29+
'for': '5m',
30+
labels: {
31+
severity: 'warning',
32+
},
33+
annotations: {
34+
summary: 'A high spike in requests is occurring which may indicate an attack or unexpected load.',
35+
description:
36+
(
37+
'The rate of requests to {{$labels.zone}} is {{ printf "%%.0f" $value }}%%s of the prior 50 minute baseline which is above the threshold of %(alertsHighRequestRate)s%%s.'
38+
) % $._config,
39+
},
40+
},
41+
{
42+
alert: 'CloudflareHighHTTPErrorCodes',
43+
expr: |||
44+
sum without (instance) (increase(cloudflare_zone_requests_status{status=~"4.*|5.*"}[5m])) > %(alertsHighHTTPErrorCodeCount)s
45+
||| % $._config,
46+
'for': '5m',
47+
labels: {
48+
severity: 'warning',
49+
},
50+
annotations: {
51+
summary: 'A high number of 4xx or 5xx HTTP status codes are occurring.',
52+
description:
53+
(
54+
'The number of {{$labels.status}} HTTP status codes occurring in the zone {{$labels.zone}} is {{ printf "%%.0f" $value }} which is greater than the threshold of %(alertsHighHTTPErrorCodeCount)s.'
55+
) % $._config,
56+
},
57+
},
58+
{
59+
alert: 'CloudflareUnhealthyPools',
60+
expr: |||
61+
sum without (instance, load_balancer_name) (cloudflare_zone_pool_health_status) == 0
62+
||| % $._config,
63+
'for': '5m',
64+
labels: {
65+
severity: 'critical',
66+
},
67+
annotations: {
68+
summary: 'There are unhealthy pools.',
69+
description:
70+
(
71+
'The pool {{$labels.pool_name}} in zone {{$labels.zone}} is currently down and unhealthy.'
72+
) % $._config,
73+
},
74+
},
75+
{
76+
alert: 'CloudflareMetricsDown',
77+
expr: |||
78+
up{job="%(alertsMetricsDownJobName)s"} == 0
79+
||| % $._config,
80+
'for': '5m',
81+
labels: {
82+
severity: 'critical',
83+
},
84+
annotations: {
85+
summary: 'Cloudflare metrics are down.',
86+
description:
87+
(
88+
'Grafana is no longer receiving metrics for the Cloudflare integration from instance {{$labels.instance}}.'
89+
) % $._config,
90+
},
91+
},
92+
],
93+
},
94+
],
95+
},
96+
}

cloudflare-mixin/config.libsonnet

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
_config+:: {
3+
dashboardTags: ['cloudflare-mixin'],
4+
dashboardPeriod: 'now-30m',
5+
dashboardTimezone: 'default',
6+
dashboardRefresh: '1m',
7+
8+
// CloudflareMetricsDown alert filter variable
9+
alertsMetricsDownJobName: 'integrations/cloudflare',
10+
11+
// alerts thresholds
12+
alertsHighThreatCount: 3, // count
13+
alertsHighRequestRate: 150, // percentage
14+
alertsHighHTTPErrorCodeCount: 100, // count
15+
},
16+
}

0 commit comments

Comments
 (0)