Skip to content

Commit 3582417

Browse files
committed
Add section about managing Monasca alerts
1 parent 8cf19cd commit 3582417

File tree

3 files changed

+119
-1
lines changed

3 files changed

+119
-1
lines changed

source/introduction.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,12 @@ A command that must be run within the Bifrost service container, hosted on the s
5959

6060
A command that can be run (as superuser) from a running compute instance.
6161

62+
``monasca#``
63+
64+
A command that must be run with OpenStack control plane admin credentials
65+
loaded, and the Monasca client and supporting modules available (whether in a
66+
virtualenv or installed in the OS libraries).
67+
6268
Glossary of Terms
6369
-----------------
6470

source/operations_and_monitoring.rst

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,9 +270,121 @@ Configuring Monasca Alerts
270270
Generating Metrics from Specific Log Messages
271271
+++++++++++++++++++++++++++++++++++++++++++++
272272

273+
If you wish to generate alerts for specific log messages, you must first
274+
generate metrics from those log messages. Metrics are generated from the
275+
transformed logs queue in Kafka. The Monasca log metrics service reads log
276+
messages from this queue, transforms them into metrics and then writes them to
277+
the metrics queue.
278+
279+
The rules which govern this transformation are defined in the logstash config
280+
file. This file can be configured via kayobe. To do this, edit
281+
``etc/kayobe/kolla/config/monasca/log-metrics.conf``, for example:
282+
283+
.. code-block:: text
284+
285+
# Create events from specific log signatures
286+
filter {
287+
if "Another thread already created a resource provider" in [log][message] {
288+
mutate {
289+
add_field => { "[log][dimensions][event]" => "hat" }
290+
}
291+
} else if "My string here" in [log][message] {
292+
mutate {
293+
add_field => { "[log][dimensions][event]" => "my_new_alert" }
294+
}
295+
}
296+
297+
Reconfigure Monasca:
298+
299+
.. code-block:: text
300+
301+
kayobe# kayobe overcloud service reconfigure --kolla-tags monasca
302+
303+
Verify that logstash doesn't complain about your modification. On each node
304+
running the ``monasca-log-metrics`` service, the logs can be inspected in the
305+
Kolla logs directory, under the ``logstash`` folder:
306+
``/var/log/kolla/logstash``.
307+
308+
Metrics will now be generated from the configured log messages. To generate
309+
alerts/notifications from your new metric, follow the next section.
310+
273311
Generating Monasca Alerts from Metrics
274312
++++++++++++++++++++++++++++++++++++++
275313

314+
Firstly, we will configure alarms and notifications. This should be done via
315+
the Monasca client. More detailed documentation is available in the `Monasca
316+
API specification
317+
<https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definitions-and-alarms>`__.
318+
This document provides an overview of common use-cases.
319+
320+
To create a Slack notification, first obtain the URL for the notification hook
321+
from Slack, and configure the notification as follows:
322+
323+
.. code-block:: console
324+
325+
monasca# monasca notification-create stackhpc_slack SLACK https://hooks.slack.com/services/UUID
326+
327+
You can view notifications at any time by invoking:
328+
329+
.. code-block:: console
330+
331+
monasca# monasca notification-list
332+
333+
To create an alarm with an associated notification:
334+
335+
.. code-block:: console
336+
337+
monasca# monasca alarm-definition-create multiple_nova_compute \
338+
'(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
339+
--description "Multiple nova compute instances detected" \
340+
--severity HIGH --alarm-actions $NOTIFICATION_ID
341+
342+
By default one alarm will be created for all hosts. This is typically useful
343+
when you are looking at the overall state of some hosts. For example in the
344+
screenshot below the ``db_mon_log_high_mem_usage`` alarm has previously
345+
triggered on a number of hosts, but is currently below threshold.
346+
347+
If you wish to have an alarm created per host you can use the ``--match-by``
348+
option and specify the hostname dimension. For example:
349+
350+
.. code-block:: console
351+
352+
monasca# monasca alarm-definition-create multiple_nova_compute \
353+
'(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
354+
--description "Multiple nova compute instances detected" \
355+
--severity HIGH --alarm-actions $NOTIFICATION_ID
356+
--match-by hostname
357+
358+
Creating an alarm per host can be useful when alerting on one off events such
359+
as log messages which need to be actioned individually. Once the issue has been
360+
investigated and fixed, the alarm can be deleted on a per host basis.
361+
362+
For example, in the case of monitoring for file system corruption one might
363+
define a metric from the system logs alerting on XFS file system corruption, or
364+
ECC memory errors. These metrics may only be generated once, but it is
365+
important that they are not ignored. Therefore, in the example below, the last
366+
operator is used so that the alarm is evaluated against the last metric
367+
associated with the log message. Since for log metrics the value of this metric
368+
is always greater than 0, this alarm can only be reset by deleting it (which
369+
can be accomplished by clicking on the dustbin icon in Monasca Grafana). By
370+
ensuring that the alarm has to be manually deleted and will not reset to the OK
371+
status, important errors can be tracked.
372+
373+
.. code-block:: console
374+
375+
monasca# monasca alarm-definition-create xfs_errors \
376+
'(last(log.event.xfs_errors_detected{}, deterministic)>0)' \
377+
--description "XFS errors detected on host" \
378+
--severity HIGH --alarm-actions $NOTIFICATION_ID \
379+
--match-by hostname
380+
381+
It is also possible to update existing alarms. For example, to update, or add
382+
multiple notifications to an alarm:
383+
384+
.. code-block:: console
385+
386+
monasca# monasca alarm-definition-patch $ALARM_ID --alarm-actions $NOTIFICATION_ID --alarm-actions $NOTIFICATION_ID_2
387+
276388
Control Plane Shutdown Procedure
277389
================================
278390

source/vars.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
.. |keystone_public_url| replace:: https://openstack.acme.example:5000
2222
.. |kibana_url| replace:: https://openstack.acme.example:5601
2323
.. |kolla_passwords| replace:: https://github.com/acme-openstack/kayobe-config/blob/acme/train/etc/kayobe/kolla/passwords.yml
24-
.. |monitoring_host| replace:: mon0
24+
.. |monitoring_host| replace:: ``mon0``
2525
.. |network_name| replace:: admin-vxlan
2626
.. |nova_rbd_pool| replace:: acme-vms
2727
.. |project_config_source_url| replace:: https://github.com/acme-openstack/acme-config.git

0 commit comments

Comments
 (0)