@@ -270,9 +270,121 @@ Configuring Monasca Alerts
270
270
Generating Metrics from Specific Log Messages
271
271
+++++++++++++++++++++++++++++++++++++++++++++
272
272
273
+ If you wish to generate alerts for specific log messages, you must first
274
+ generate metrics from those log messages. Metrics are generated from the
275
+ transformed logs queue in Kafka. The Monasca log metrics service reads log
276
+ messages from this queue, transforms them into metrics and then writes them to
277
+ the metrics queue.
278
+
279
+ The rules which govern this transformation are defined in the logstash config
280
+ file. This file can be configured via kayobe. To do this, edit
281
+ ``etc/kayobe/kolla/config/monasca/log-metrics.conf ``, for example:
282
+
283
+ .. code-block :: text
284
+
285
+ # Create events from specific log signatures
286
+ filter {
287
+ if "Another thread already created a resource provider" in [log][message] {
288
+ mutate {
289
+ add_field => { "[log][dimensions][event]" => "hat" }
290
+ }
291
+ } else if "My string here" in [log][message] {
292
+ mutate {
293
+ add_field => { "[log][dimensions][event]" => "my_new_alert" }
294
+ }
295
+ }
296
+
297
+ Reconfigure Monasca:
298
+
299
+ .. code-block :: text
300
+
301
+ kayobe# kayobe overcloud service reconfigure --kolla-tags monasca
302
+
303
+ Verify that logstash doesn't complain about your modification. On each node
304
+ running the ``monasca-log-metrics `` service, the logs can be inspected in the
305
+ Kolla logs directory, under the ``logstash `` folder:
306
+ ``/var/log/kolla/logstash ``.
307
+
308
+ Metrics will now be generated from the configured log messages. To generate
309
+ alerts/notifications from your new metric, follow the next section.
310
+
273
311
Generating Monasca Alerts from Metrics
274
312
++++++++++++++++++++++++++++++++++++++
275
313
314
+ Firstly, we will configure alarms and notifications. This should be done via
315
+ the Monasca client. More detailed documentation is available in the `Monasca
316
+ API specification
317
+ <https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definitions-and-alarms> `__.
318
+ This document provides an overview of common use-cases.
319
+
320
+ To create a Slack notification, first obtain the URL for the notification hook
321
+ from Slack, and configure the notification as follows:
322
+
323
+ .. code-block :: console
324
+
325
+ monasca# monasca notification-create stackhpc_slack SLACK https://hooks.slack.com/services/UUID
326
+
327
+ You can view notifications at any time by invoking:
328
+
329
+ .. code-block :: console
330
+
331
+ monasca# monasca notification-list
332
+
333
+ To create an alarm with an associated notification:
334
+
335
+ .. code-block :: console
336
+
337
+ monasca# monasca alarm-definition-create multiple_nova_compute \
338
+ '(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
339
+ --description "Multiple nova compute instances detected" \
340
+ --severity HIGH --alarm-actions $NOTIFICATION_ID
341
+
342
+ By default one alarm will be created for all hosts. This is typically useful
343
+ when you are looking at the overall state of some hosts. For example in the
344
+ screenshot below the ``db_mon_log_high_mem_usage `` alarm has previously
345
+ triggered on a number of hosts, but is currently below threshold.
346
+
347
+ If you wish to have an alarm created per host you can use the ``--match-by ``
348
+ option and specify the hostname dimension. For example:
349
+
350
+ .. code-block :: console
351
+
352
+ monasca# monasca alarm-definition-create multiple_nova_compute \
353
+ '(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
354
+ --description "Multiple nova compute instances detected" \
355
+ --severity HIGH --alarm-actions $NOTIFICATION_ID
356
+ --match-by hostname
357
+
358
+ Creating an alarm per host can be useful when alerting on one off events such
359
+ as log messages which need to be actioned individually. Once the issue has been
360
+ investigated and fixed, the alarm can be deleted on a per host basis.
361
+
362
+ For example, in the case of monitoring for file system corruption one might
363
+ define a metric from the system logs alerting on XFS file system corruption, or
364
+ ECC memory errors. These metrics may only be generated once, but it is
365
+ important that they are not ignored. Therefore, in the example below, the last
366
+ operator is used so that the alarm is evaluated against the last metric
367
+ associated with the log message. Since for log metrics the value of this metric
368
+ is always greater than 0, this alarm can only be reset by deleting it (which
369
+ can be accomplished by clicking on the dustbin icon in Monasca Grafana). By
370
+ ensuring that the alarm has to be manually deleted and will not reset to the OK
371
+ status, important errors can be tracked.
372
+
373
+ .. code-block :: console
374
+
375
+ monasca# monasca alarm-definition-create xfs_errors \
376
+ '(last(log.event.xfs_errors_detected{}, deterministic)>0)' \
377
+ --description "XFS errors detected on host" \
378
+ --severity HIGH --alarm-actions $NOTIFICATION_ID \
379
+ --match-by hostname
380
+
381
+ It is also possible to update existing alarms. For example, to update, or add
382
+ multiple notifications to an alarm:
383
+
384
+ .. code-block :: console
385
+
386
+ monasca# monasca alarm-definition-patch $ALARM_ID --alarm-actions $NOTIFICATION_ID --alarm-actions $NOTIFICATION_ID_2
387
+
276
388
Control Plane Shutdown Procedure
277
389
================================
278
390
0 commit comments