Skip to content

Commit 4c23b40

Browse files
Add more job info to Prometheus alerts, fix grammar
1 parent bc622fb commit 4c23b40

File tree

2 files changed

+40
-35
lines changed

2 files changed

+40
-35
lines changed

example/prometheus/alerts.yml

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ groups:
88
labels:
99
severity: page
1010
annotations:
11-
summary: "Instance {{ $labels.instance }} down"
11+
summary: "Instance {{ $labels.instance }} ({{ $labels.job }}) down"
1212
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than a minute."
1313

1414

@@ -21,7 +21,7 @@ groups:
2121
labels:
2222
severity: warning
2323
annotations:
24-
summary: "Instance {{ $labels.alias }} Lua runtime warning"
24+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) Lua runtime warning"
2525
description: "{{ $labels.alias }} instance of job {{ $labels.job }} uses too much Lua memory
2626
and may hit threshold soon."
2727

@@ -32,7 +32,7 @@ groups:
3232
labels:
3333
severity: page
3434
annotations:
35-
summary: "Instance {{ $labels.alias }} Lua runtime alert"
35+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) Lua runtime alert"
3636
description: "{{ $labels.alias }} instance of job {{ $labels.job }} uses too much Lua memory
3737
and likely to hit threshold soon."
3838

@@ -43,7 +43,7 @@ groups:
4343
labels:
4444
severity: warning
4545
annotations:
46-
summary: "Instance {{ $labels.alias }} low arena memory remaining"
46+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) low arena memory remaining"
4747
description: "Low arena memory (tuples and indexes) remaining for {{ $labels.alias }} instance of job {{ $labels.job }}.
4848
Consider increasing memtx_memory or number of storages in case of sharded data."
4949

@@ -54,7 +54,7 @@ groups:
5454
labels:
5555
severity: page
5656
annotations:
57-
summary: "Instance {{ $labels.alias }} low arena memory remaining"
57+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) low arena memory remaining"
5858
description: "Low arena memory (tuples and indexes) remaining for {{ $labels.alias }} instance of job {{ $labels.job }}.
5959
You are likely to hit limit soon.
6060
It is strongly recommended to increase memtx_memory or number of storages in case of sharded data."
@@ -66,7 +66,7 @@ groups:
6666
labels:
6767
severity: warning
6868
annotations:
69-
summary: "Instance {{ $labels.alias }} low items memory remaining"
69+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) low items memory remaining"
7070
description: "Low items memory (tuples) remaining for {{ $labels.alias }} instance of job {{ $labels.job }}.
7171
Consider increasing memtx_memory or number of storages in case of sharded data."
7272

@@ -77,7 +77,7 @@ groups:
7777
labels:
7878
severity: page
7979
annotations:
80-
summary: "Instance {{ $labels.alias }} low items memory remaining"
80+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) low items memory remaining"
8181
description: "Low items memory (tuples) remaining for {{ $labels.alias }} instance of job {{ $labels.job }}.
8282
You are likely to hit limit soon.
8383
It is strongly recommended to increase memtx_memory or number of storages in case of sharded data."
@@ -89,8 +89,9 @@ groups:
8989
labels:
9090
severity: warning
9191
annotations:
92-
summary: "Instance {{ $labels.alias }} have 'warning'-level Cartridge issues"
93-
description: "Possible reasons: high replication lag, replication long idle,
92+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) has 'warning'-level Cartridge issues"
93+
description: "Instance {{ $labels.alias }} of job {{ $labels.job }} has 'warning'-level Cartridge issues.
94+
Possible reasons: high replication lag, replication long idle,
9495
failover or switchover issues, clock issues, memory fragmentation,
9596
configuration issues, alien members."
9697

@@ -101,8 +102,9 @@ groups:
101102
labels:
102103
severity: page
103104
annotations:
104-
summary: "Instance {{ $labels.alias }} have 'critical'-level Cartridge issues"
105-
description: "Possible reasons: replication process critical fail,
105+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) has 'critical'-level Cartridge issues"
106+
description: "Instance {{ $labels.alias }} of job {{ $labels.job }} has 'critical'-level Cartridge issues.
107+
Possible reasons: replication process critical fail,
106108
running out of available memory."
107109

108110

@@ -112,7 +114,7 @@ groups:
112114
labels:
113115
severity: warning
114116
annotations:
115-
summary: "Instance {{ $labels.alias }} have high replication lag"
117+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) have high replication lag"
116118
description: "Instance {{ $labels.alias }} of job {{ $labels.job }} have high replication lag,
117119
check up your network and cluster state."
118120

@@ -128,7 +130,7 @@ groups:
128130
labels:
129131
severity: warning
130132
annotations:
131-
summary: "Instance {{ $labels.alias }} high HTTP latency"
133+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) high HTTP latency"
132134
description: "Some {{ $labels.method }} requests to {{ $labels.path }} path with {{ $labels.status }} response status
133135
on {{ $labels.alias }} instance of job {{ $labels.job }} are processed too long."
134136

@@ -142,7 +144,7 @@ groups:
142144
labels:
143145
severity: warning
144146
annotations:
145-
summary: "Instance {{ $labels.alias }} high rate of client error responses"
147+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) high rate of client error responses"
146148
description: "Too many {{ $labels.method }} requests to {{ $labels.path }} path
147149
on {{ $labels.alias }} instance of job {{ $labels.job }} get client error (4xx) responses."
148150

@@ -170,7 +172,7 @@ groups:
170172
labels:
171173
severity: warning
172174
annotations:
173-
summary: "Instance {{ $labels.alias }} server error responses"
175+
summary: "Instance {{ $labels.alias }} ({{ $labels.job }}) server error responses"
174176
description: "Some {{ $labels.method }} requests to {{ $labels.path }} path
175177
on {{ $labels.alias }} instance of job {{ $labels.job }} get server error (5xx) responses."
176178

@@ -184,6 +186,6 @@ groups:
184186
labels:
185187
severity: warning
186188
annotations:
187-
summary: "Router {{ $labels.alias }} low activity"
189+
summary: "Router {{ $labels.alias }} ({{ $labels.job }}) low activity"
188190
description: Router {{ $labels.alias }} instance of job {{ $labels.job }} gets too little requests.
189191
Please, check up your balancer middleware."

example/prometheus/test_alerts.yml

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ tests:
2222
instance: app:8082
2323
job: tarantool_app
2424
exp_annotations:
25-
summary: "Instance app:8082 down"
25+
summary: "Instance app:8082 (tarantool_app) down"
2626
description: "app:8082 of job tarantool_app has been down for more than a minute."
2727

2828

@@ -40,7 +40,7 @@ tests:
4040
alias: tnt_router
4141
job: tarantool_app
4242
exp_annotations:
43-
summary: "Instance tnt_router Lua runtime warning"
43+
summary: "Instance tnt_router (tarantool_app) Lua runtime warning"
4444
description: "tnt_router instance of job tarantool_app uses too much Lua memory
4545
and may hit threshold soon."
4646
- eval_time: 2m
@@ -62,7 +62,7 @@ tests:
6262
alias: tnt_router
6363
job: tarantool_app
6464
exp_annotations:
65-
summary: "Instance tnt_router Lua runtime warning"
65+
summary: "Instance tnt_router (tarantool_app) Lua runtime warning"
6666
description: "tnt_router instance of job tarantool_app uses too much Lua memory
6767
and may hit threshold soon."
6868
- eval_time: 2m
@@ -74,7 +74,7 @@ tests:
7474
alias: tnt_router
7575
job: tarantool_app
7676
exp_annotations:
77-
summary: "Instance tnt_router Lua runtime alert"
77+
summary: "Instance tnt_router (tarantool_app) Lua runtime alert"
7878
description: "tnt_router instance of job tarantool_app uses too much Lua memory
7979
and likely to hit threshold soon."
8080

@@ -110,7 +110,7 @@ tests:
110110
alias: tnt_router
111111
job: tarantool_app
112112
exp_annotations:
113-
summary: "Instance tnt_router low arena memory remaining"
113+
summary: "Instance tnt_router (tarantool_app) low arena memory remaining"
114114
description: "Low arena memory (tuples and indexes) remaining for tnt_router instance of job tarantool_app.
115115
Consider increasing memtx_memory or number of storages in case of sharded data."
116116
- eval_time: 2m
@@ -134,7 +134,7 @@ tests:
134134
alias: tnt_router
135135
job: tarantool_app
136136
exp_annotations:
137-
summary: "Instance tnt_router low arena memory remaining"
137+
summary: "Instance tnt_router (tarantool_app) low arena memory remaining"
138138
description: "Low arena memory (tuples and indexes) remaining for tnt_router instance of job tarantool_app.
139139
Consider increasing memtx_memory or number of storages in case of sharded data."
140140
- eval_time: 2m
@@ -146,7 +146,7 @@ tests:
146146
alias: tnt_router
147147
job: tarantool_app
148148
exp_annotations:
149-
summary: "Instance tnt_router low arena memory remaining"
149+
summary: "Instance tnt_router (tarantool_app) low arena memory remaining"
150150
description: "Low arena memory (tuples and indexes) remaining for tnt_router instance of job tarantool_app.
151151
You are likely to hit limit soon.
152152
It is strongly recommended to increase memtx_memory or number of storages in case of sharded data."
@@ -183,7 +183,7 @@ tests:
183183
alias: tnt_router
184184
job: tarantool_app
185185
exp_annotations:
186-
summary: "Instance tnt_router low items memory remaining"
186+
summary: "Instance tnt_router (tarantool_app) low items memory remaining"
187187
description: "Low items memory (tuples) remaining for tnt_router instance of job tarantool_app.
188188
Consider increasing memtx_memory or number of storages in case of sharded data."
189189
- eval_time: 2m
@@ -208,8 +208,9 @@ tests:
208208
alias: tnt_router
209209
job: tarantool_app
210210
exp_annotations:
211-
summary: "Instance tnt_router have 'warning'-level Cartridge issues"
212-
description: "Possible reasons: high replication lag, replication long idle,
211+
summary: "Instance tnt_router (tarantool_app) has 'warning'-level Cartridge issues"
212+
description: "Instance tnt_router of job tarantool_app has 'warning'-level Cartridge issues.
213+
Possible reasons: high replication lag, replication long idle,
213214
failover or switchover issues, clock issues, memory fragmentation,
214215
configuration issues, alien members."
215216
- eval_time: 2m
@@ -234,8 +235,9 @@ tests:
234235
alias: tnt_router
235236
job: tarantool_app
236237
exp_annotations:
237-
summary: "Instance tnt_router have 'warning'-level Cartridge issues"
238-
description: "Possible reasons: high replication lag, replication long idle,
238+
summary: "Instance tnt_router (tarantool_app) has 'warning'-level Cartridge issues"
239+
description: "Instance tnt_router of job tarantool_app has 'warning'-level Cartridge issues.
240+
Possible reasons: high replication lag, replication long idle,
239241
failover or switchover issues, clock issues, memory fragmentation,
240242
configuration issues, alien members."
241243
- eval_time: 2m
@@ -248,8 +250,9 @@ tests:
248250
alias: tnt_router
249251
job: tarantool_app
250252
exp_annotations:
251-
summary: "Instance tnt_router have 'critical'-level Cartridge issues"
252-
description: "Possible reasons: replication process critical fail,
253+
summary: "Instance tnt_router (tarantool_app) has 'critical'-level Cartridge issues"
254+
description: "Instance tnt_router of job tarantool_app has 'critical'-level Cartridge issues.
255+
Possible reasons: replication process critical fail,
253256
running out of available memory."
254257

255258

@@ -269,7 +272,7 @@ tests:
269272
alias: tnt_storage_replica
270273
job: tarantool_app
271274
exp_annotations:
272-
summary: "Instance tnt_storage_replica have high replication lag"
275+
summary: "Instance tnt_storage_replica (tarantool_app) have high replication lag"
273276
description: "Instance tnt_storage_replica of job tarantool_app have high replication lag,
274277
check up your network and cluster state."
275278

@@ -300,7 +303,7 @@ tests:
300303
status: '200'
301304
quantile: '0.99'
302305
exp_annotations:
303-
summary: "Instance tnt_router high HTTP latency"
306+
summary: "Instance tnt_router (tarantool_app) high HTTP latency"
304307
description: "Some GET requests to /hello path with 200 response status
305308
on tnt_router instance of job tarantool_app are processed too long."
306309

@@ -329,7 +332,7 @@ tests:
329332
path: /hell0
330333
method: GET
331334
exp_annotations:
332-
summary: "Instance tnt_router high rate of client error responses"
335+
summary: "Instance tnt_router (tarantool_app) high rate of client error responses"
333336
description: "Too many GET requests to /hell0 path
334337
on tnt_router instance of job tarantool_app get client error (4xx) responses."
335338

@@ -407,7 +410,7 @@ tests:
407410
path: /goodbye
408411
method: POST
409412
exp_annotations:
410-
summary: "Instance tnt_router server error responses"
413+
summary: "Instance tnt_router (tarantool_app) server error responses"
411414
description: "Some POST requests to /goodbye path
412415
on tnt_router instance of job tarantool_app get server error (5xx) responses."
413416

@@ -433,6 +436,6 @@ tests:
433436
alias: tnt_router
434437
job: tarantool_app
435438
exp_annotations:
436-
summary: "Router tnt_router low activity"
439+
summary: "Router tnt_router (tarantool_app) low activity"
437440
description: Router tnt_router instance of job tarantool_app gets too little requests.
438441
Please, check up your balancer middleware."

0 commit comments

Comments
 (0)