Skip to content

Add basic cluster health alerts #631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions src/prometheus_alert_rules/metrics_alert_rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,41 @@ groups:
MySQL restarted less than one minute ago.
If the restart was unplanned or frequent, check Loki logs (e.g. `error.log`).
LABELS = {{ $labels }}.

# Basic Cluster Health
- alert: MySQLClusterUnitOffline
expr: mysql_perf_schema_replication_group_member_info{member_state="OFFLINE"} > 0
for: 5m
labels:
severity: Warning
annotations:
summary: MySQL cluster reports one node as offline.
description: |
The MySQL member is marked offline in the cluster, although the process might still be running.
If this is unexptected, please check the logs.
LABELS = {{ $labels }}.

- alert: MySQLClusterNoPrimary
expr: absent(mysql_perf_schema_replication_group_member_info{member_role="PRIMARY"}) or mysql_perf_schema_replication_group_member_info{member_role="PRIMARY"} == 0
for: 0m
labels:
severity: Critical
annotations:
summary: MySQL cluster reports no primaries
description: |
MySQL has no primaries. The cluster will likely be in a Read-Only state.
Please check the cluster health, the logs and investigate.
LABELS = {{ $labels }}.

# Alert after 15 minutes, as a change in primaries can sometimes result in this metric reporting two
- alert: MySQLClusterTooManyPrimaries
expr: mysql_perf_schema_replication_group_member_info{member_role="PRIMARY"} > 1
for: 15m
labels:
severity: Critical
annotations:
summary: MySQL cluster reports more than one primary.
description: |
MySQL reports more than one primary. This is can indicate a split-brain situation.
Please refer to the troubleshooting docs.
LABELS = {{ $labels }}.
Loading