Skip to content

Commit 56c4f9d

Browse files
committed
feat: gp alerts
1 parent 29437d3 commit 56c4f9d

File tree

5 files changed

+502
-26
lines changed

5 files changed

+502
-26
lines changed

.github/workflows/ci.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,19 @@ jobs:
4242
- name: Upload Coverage to Codecov
4343
uses: codecov/codecov-action@v5
4444

45+
alert-test:
46+
name: Test Prometheus Alert Rules
47+
runs-on: ubuntu-latest
48+
steps:
49+
- name: Checkout repo
50+
uses: actions/checkout@v3
51+
- name: Install prometheus snap
52+
run: sudo snap install prometheus
53+
- name: Check validity of prometheus alert rules
54+
run: promtool check rules src/prometheus_alert_rules/*.yaml
55+
- name: Run unit tests for prometheus alert rules
56+
run: promtool test rules tests/alerts/*.yaml
57+
4558
build:
4659
name: Build charm
4760
uses: canonical/data-platform-workflows/.github/workflows/[email protected]
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,30 @@
11
groups:
2-
- name: MySQLExporterK8s
3-
2+
- name: MySQL General Alert Rules
43
rules:
5-
# 2.1.1
64
- alert: MySQLDown
7-
expr: "mysql_up == 0"
5+
expr: mysql_up == 0
86
for: 0m
97
labels:
108
severity: critical
119
annotations:
12-
summary: MySQL instance {{ $labels.instance }} is down.
10+
summary: MySQL instance {{ $labels.instance }} is down.
1311
description: |
12+
The MySQL instance is not reachable.
13+
Please check if the MySQL process is running and the network connectivity.
14+
LABELS = {{ $labels }}.
15+
16+
- alert: MySQLMetricsScrapeError
17+
expr: increase(mysql_exporter_last_scrape_error[5m]) > 1
18+
for: 0m
19+
labels:
20+
severity: warning
21+
annotations:
22+
summary: MySQL instance {{ $labels.instance }} has a metrics scrape error.
23+
description: |
24+
The MySQL Exporter encountered an error while scraping metrics.
25+
Check the MySQL Exporter logs for more details.
1426
LABELS = {{ $labels }}.
1527
16-
# 2.1.2
17-
# customized: 80% -> 90%
1828
- alert: MySQLTooManyConnections(>90%)
1929
expr: max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections * 100 > 90
2030
for: 2m
@@ -24,10 +34,8 @@ groups:
2434
summary: MySQL instance {{ $labels.instance }} is using > 90% of `max_connections`.
2535
description: |
2636
Consider checking the client application responsible for generating those additional connections.
27-
LABELS = {{ $labels }}.
37+
LABELS = {{ $labels }}.
2838
29-
# 2.1.4
30-
# customized: 60% -> 80%
3139
- alert: MySQLHighThreadsRunning
3240
expr: max_over_time(mysql_global_status_threads_running[1m]) / mysql_global_variables_max_connections * 100 > 80
3341
for: 2m
@@ -36,10 +44,9 @@ groups:
3644
annotations:
3745
summary: MySQL instance {{ $labels.instance }} is actively using > 80% of `max_connections`.
3846
description: |
39-
Consider reviewing the value of the `max-connections` config parameter or allocate more resources to your database server.
40-
LABELS = {{ $labels }}.
47+
Consider reviewing the value of the `max-connections` config parameter or allocate more resources to your database server.
48+
LABELS = {{ $labels }}.
4149
42-
# 2.1.3
4350
- alert: MySQLHighPreparedStatementsUtilization(>80%)
4451
expr: max_over_time(mysql_global_status_prepared_stmt_count[1m]) / mysql_global_variables_max_prepared_stmt_count * 100 > 80
4552
for: 2m
@@ -48,36 +55,32 @@ groups:
4855
annotations:
4956
summary: MySQL instance {{ $labels.instance }} is using > 80% of `max_prepared_stmt_count`.
5057
description: |
51-
Too many prepared statements might consume a lot of memory.
52-
LABELS = {{ $labels }}.
58+
Too many prepared statements might consume a lot of memory.
59+
LABELS = {{ $labels }}.
5360
54-
# 2.1.8
55-
# customized: warning -> info
5661
- alert: MySQLSlowQueries
5762
expr: increase(mysql_global_status_slow_queries[1m]) > 0
5863
for: 2m
5964
labels:
6065
severity: info
6166
annotations:
62-
summary: MySQL instance {{ $labels.instance }} has a slow query.
67+
summary: MySQL instance {{ $labels.instance }} has slow queries.
6368
description: |
64-
Consider optimizing the query by reviewing its execution plan, then rewrite the query and add any relevant indexes.
69+
Consider optimizing the query by reviewing its execution plan, then rewrite the query and add any relevant indexes.
6570
LABELS = {{ $labels }}.
6671
67-
# 2.1.9
6872
- alert: MySQLInnoDBLogWaits
6973
expr: rate(mysql_global_status_innodb_log_waits[15m]) > 10
7074
for: 0m
7175
labels:
7276
severity: warning
7377
annotations:
74-
summary: MySQL instance {{ $labels.instance }} has long InnoDB log waits.
78+
summary: MySQL instance {{ $labels.instance }} has long InnoDB log waits.
7579
description: |
76-
MySQL InnoDB log writes might be stalling.
77-
Check I/O activity on your nodes to find the responsible process or query. Consider using iotop and the performance_schema.
80+
MySQL InnoDB log writes might be stalling.
81+
Check I/O activity on your nodes to find the responsible process or query. Consider using iotop and the performance_schema.
7882
LABELS = {{ $labels }}.
7983
80-
# 2.1.10
8184
- alert: MySQLRestarted
8285
expr: mysql_global_status_uptime < 60
8386
for: 0m
@@ -86,6 +89,18 @@ groups:
8689
annotations:
8790
summary: MySQL instance {{ $labels.instance }} restarted.
8891
description: |
89-
MySQL restarted less than one minute ago.
90-
If the restart was unplanned or frequent, check Loki logs (e.g. `error.log`).
92+
MySQL restarted less than one minute ago.
93+
If the restart was unplanned or frequent, check Loki logs (e.g. `error.log`).
94+
LABELS = {{ $labels }}.
95+
96+
- alert: MySQLConnectionErrors
97+
expr: increase(mysql_global_status_connection_errors_total[5m]) > 10
98+
for: 0m
99+
labels:
100+
severity: warning
101+
annotations:
102+
summary: MySQL instance {{ $labels.instance }} has connection errors.
103+
description: |
104+
Connection errors might indicate network issues, authentication problems, or resource limitations.
105+
Check the MySQL logs for more details.
91106
LABELS = {{ $labels }}.
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
groups:
2+
- name: MySQL Replication Alert Rules
3+
rules:
4+
- alert: MySQLClusterUnitOffline
5+
expr: mysql_perf_schema_replication_group_member_info{member_state="OFFLINE"} == 1
6+
for: 5m
7+
labels:
8+
severity: warning
9+
annotations:
10+
summary: MySQL cluster member {{ $labels.instance }} is offline
11+
description: |
12+
The MySQL member is marked offline in the cluster, although the process might still be running.
13+
If this is unexptected, please check the logs.
14+
LABELS = {{ $labels }}.
15+
16+
- alert: MySQLClusterNoPrimary
17+
expr: absent(mysql_perf_schema_replication_group_member_info{member_role="PRIMARY"})
18+
for: 0m
19+
labels:
20+
severity: critical
21+
annotations:
22+
summary: MySQL cluster reports no primariy
23+
description: |
24+
MySQL has no primaries. The cluster will likely be in a Read-Only state.
25+
Please check the cluster health, the logs and investigate.
26+
LABELS = {{ $labels }}.
27+
28+
- alert: MySQLClusterTooManyPrimaries
29+
expr: count(mysql_perf_schema_replication_group_member_info{member_role="PRIMARY"}) > 1
30+
for: 15m
31+
labels:
32+
severity: critical
33+
annotations:
34+
summary: MySQL cluster reports more than one primary.
35+
description: |
36+
MySQL reports more than one primary. This is can indicate a split-brain situation.
37+
Please refer to the troubleshooting docs.
38+
LABELS = {{ $labels }}.
39+
40+
- alert: MySQLNoReplication
41+
expr: absent(mysql_perf_schema_replication_group_member_info{member_role="SECONDARY"})
42+
for: 15m
43+
labels:
44+
severity: warning
45+
annotations:
46+
summary: MySQL cluster has no secondaries.
47+
description: |
48+
The MySQL cluster has no secondaries. This means that the cluster is not redundant and a failure of the primary will lead to downtime.
49+
Please check the cluster health, the logs and investigate.
50+
LABELS = {{ $labels }}.
51+
52+
- alert: MySQLGroupReplicationReduced
53+
expr: |
54+
count(mysql_perf_schema_replication_group_member_info{member_state="ONLINE"} == 1)
55+
<
56+
max_over_time(
57+
count(mysql_perf_schema_replication_group_member_info{member_state="ONLINE"} == 1)[6h:]
58+
)
59+
for: 15m
60+
labels:
61+
severity: warning
62+
annotations:
63+
summary: MySQL cluster's Group Replication size reduced
64+
description: |
65+
The number of ONLINE members in the MySQL Group Replication cluster has reduced compared to the maximum observed in the last 6 hours.
66+
Please check the cluster health, the logs and investigate.
67+
LABELS = {{ $labels }}.
68+
69+
- alert: MySQLGroupReplicationConflicts
70+
expr: rate(mysql_perf_schema_conflicts_detected[5m]) > 0
71+
for: 5m
72+
labels:
73+
severity: warning
74+
annotations:
75+
summary: MySQL cluster reports Group Replication conflicts
76+
description: |
77+
Conflicts indicate concurrent writes to the same rows/keys across members.
78+
Please check the cluster health, the logs and investigate.
79+
LABELS = {{ $labels }}.
80+
81+
- alert: MySQLGroupReplicationQueueSizeHigh
82+
expr: mysql_perf_schema_transactions_in_queue > 100
83+
for: 5m
84+
labels:
85+
severity: warning
86+
annotations:
87+
summary: MySQL cluster reports high Group Replication queue size
88+
description: |
89+
A high number of transactions in the Group Replication queue might indicate network issues or overloaded nodes.
90+
Please check the cluster health, the logs and investigate.
91+
LABELS = {{ $labels }}.

0 commit comments

Comments
 (0)