Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions docs/reference/alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,10 @@ This page contains a markdown version of the alert rules described in the `postg

| Alert | Severity | Notes |
|------|----------|-------|
| PatroniPostgresqlDown | ![critical] | Patroni PostgreSQL instance is down.<br>Check for errors in the Loki logs. |
| PatroniHasNoLeader | ![critical] | Patroni instance has no leader node.<br>A leader node (neither primary nor standby) cannot be found inside a cluster.<br>Check for errors in the Loki logs. |
| `PatroniPostgresqlDown` | ![critical] | Patroni PostgreSQL instance is down.<br>Check for errors in the Loki logs. |
| `PatroniMultipleLeaders` | ![critical] | Patroni cluster has multiple leader nodes.<br>More than one leader node (primary or standby) is detected inside a cluster.<br>This may indicate split-brain; check Patroni/Loki logs and network/quorum state. |
| `PatroniPrimaryAndStandbyLeader` | ![critical] | Patroni cluster has both primary and standby leaders.<br>A primary leader and a standby leader are simultaneously detected inside a cluster.<br>Check for errors in the Loki logs. |
| `PatroniHasNoLeader` | ![critical] | Patroni instance has no leader node.<br>A leader node (neither primary nor standby) cannot be found inside a cluster.<br>Check for errors in the Loki logs. |

## `PgbackrestExporterK8s`

Expand Down
28 changes: 26 additions & 2 deletions src/prometheus_alert_rules/patroni_rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,38 @@ groups:
Check for errors in the Loki logs.
LABELS = {{ $labels }}

- alert: PatroniMultipleLeaders
expr: 'sum by (juju_model,juju_application,juju_model_uuid,scope) (patroni_master) > 1 or sum by (juju_model,juju_application,juju_model_uuid,scope) (patroni_standby_leader) > 1'
for: 0m
labels:
severity: critical
annotations:
summary: Patroni cluster {{ $labels.scope }} has multiple leader nodes.
description: |
More than one leader node (primary or standby) is detected inside the cluster {{ $labels.scope }}.
Check for errors in the Loki logs.
LABELS = {{ $labels }}

- alert: PatroniPrimaryAndStandbyLeader
expr: 'sum by (juju_model,juju_application,juju_model_uuid,scope) (patroni_master) == 1 and sum by (juju_model,juju_application,juju_model_uuid,scope) (patroni_standby_leader) == 1'
for: 0m
labels:
severity: critical
annotations:
summary: Patroni cluster {{ $labels.scope }} has both primary and standby leaders.
description: |
A primary leader and a standby leader are simultaneously detected inside the cluster {{ $labels.scope }}.
Check for errors in the Loki logs.
LABELS = {{ $labels }}

# 2.4.1
- alert: PatroniHasNoLeader
expr: '(max by (scope) (patroni_master) < 1) and (max by (scope) (patroni_standby_leader) < 1)'
expr: '(max by (juju_model,juju_application,juju_model_uuid,scope) (patroni_master) < 1) and (max by (juju_model,juju_application,juju_model_uuid,scope) (patroni_standby_leader) < 1)'
for: 0m
labels:
severity: critical
annotations:
summary: Patroni instance {{ $labels.instance }} has no leader node.
summary: Patroni instance {{ $labels.instance }} has no leader node.
description: |
A leader node (neither primary nor standby) cannot be found inside the cluster {{ $labels.scope }}.
Check for errors in the Loki logs.
Expand Down
122 changes: 122 additions & 0 deletions tests/alerts/test_patroni_rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,125 @@ tests:
- alertname: PatroniHasNoLeader
eval_time: 1m
exp_alerts: []

- name: PatroniMultipleLeaders does not fire if master=1 and standby_leader=0
interval: 1m
input_series:
- series: 'patroni_master{scope="cluster1"}'
values: '1'
- series: 'patroni_standby_leader{scope="cluster1"}'
values: '0'
alert_rule_test:
- alertname: PatroniMultipleLeaders
eval_time: 1m
exp_alerts: []

- name: PatroniMultipleLeaders does not fire if master=0 and standby_leader=1
interval: 1m
input_series:
- series: 'patroni_master{scope="cluster1"}'
values: '0'
- series: 'patroni_standby_leader{scope="cluster1"}'
values: '1'
alert_rule_test:
- alertname: PatroniMultipleLeaders
eval_time: 1m
exp_alerts: []

- name: PatroniMultipleLeaders does not fire if master=1 and standby_leader=1
interval: 1m
input_series:
- series: 'patroni_master{scope="cluster1"}'
values: '1'
- series: 'patroni_standby_leader{scope="cluster1"}'
values: '1'
alert_rule_test:
- alertname: PatroniMultipleLeaders
eval_time: 1m
exp_alerts: []

- name: PatroniMultipleLeaders fires if two masters exist in one scope
interval: 1m
input_series:
- series: 'patroni_master{scope="cluster1",instance="pg1"}'
values: '1'
- series: 'patroni_master{scope="cluster1",instance="pg2"}'
values: '1'
- series: 'patroni_standby_leader{scope="cluster1",instance="pg1"}'
values: '0'
- series: 'patroni_standby_leader{scope="cluster1",instance="pg2"}'
values: '0'
alert_rule_test:
- alertname: PatroniMultipleLeaders
eval_time: 0m
exp_alerts:
- exp_labels:
alertname: PatroniMultipleLeaders
severity: critical
scope: cluster1
exp_annotations:
summary: Patroni cluster cluster1 has multiple leader nodes.
description: |
More than one leader node (primary or standby) is detected inside the cluster cluster1.
Check for errors in the Loki logs.
LABELS = map[scope:cluster1]

- name: PatroniMultipleLeaders fires if two standby leaders exist in one scope
interval: 1m
input_series:
- series: 'patroni_master{scope="cluster1",instance="pg1"}'
values: '0'
- series: 'patroni_master{scope="cluster1",instance="pg2"}'
values: '0'
- series: 'patroni_standby_leader{scope="cluster1",instance="pg1"}'
values: '1'
- series: 'patroni_standby_leader{scope="cluster1",instance="pg2"}'
values: '1'
alert_rule_test:
- alertname: PatroniMultipleLeaders
eval_time: 0m
exp_alerts:
- exp_labels:
alertname: PatroniMultipleLeaders
severity: critical
scope: cluster1
exp_annotations:
summary: Patroni cluster cluster1 has multiple leader nodes.
description: |
More than one leader node (primary or standby) is detected inside the cluster cluster1.
Check for errors in the Loki logs.
LABELS = map[scope:cluster1]

- name: PatroniPrimaryAndStandbyLeader does not fire if master=1 and standby_leader=0
interval: 1m
input_series:
- series: 'patroni_master{scope="cluster1"}'
values: '1'
- series: 'patroni_standby_leader{scope="cluster1"}'
values: '0'
alert_rule_test:
- alertname: PatroniPrimaryAndStandbyLeader
eval_time: 1m
exp_alerts: []

- name: PatroniPrimaryAndStandbyLeader fires if master=1 and standby_leader=1
interval: 1m
input_series:
- series: 'patroni_master{scope="cluster1"}'
values: '1'
- series: 'patroni_standby_leader{scope="cluster1"}'
values: '1'
alert_rule_test:
- alertname: PatroniPrimaryAndStandbyLeader
eval_time: 0m
exp_alerts:
- exp_labels:
alertname: PatroniPrimaryAndStandbyLeader
severity: critical
scope: cluster1
exp_annotations:
summary: Patroni cluster cluster1 has both primary and standby leaders.
description: |
A primary leader and a standby leader are simultaneously detected inside the cluster cluster1.
Check for errors in the Loki logs.
LABELS = map[scope:cluster1]
Loading