Skip to content

Commit 348e154

Browse files
authored
Merge pull request #21 from stackhpc/monitoring
Update monitoring documentation
2 parents 4b0776f + 77b01ee commit 348e154

File tree

3 files changed

+48
-155
lines changed

3 files changed

+48
-155
lines changed

source/introduction.rst

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,6 @@ A command that must be run within the Bifrost service container, hosted on the s
6060

6161
A command that can be run (as superuser) from a running compute instance.
6262

63-
``monasca#``
64-
65-
A command that must be run with OpenStack control plane admin credentials
66-
loaded, and the Monasca client and supporting modules available (whether in a
67-
virtualenv or installed in the OS libraries).
68-
6963
Glossary of Terms
7064
-----------------
7165

@@ -130,12 +124,6 @@ Glossary of Terms
130124
Multi-Chassis Link Aggregate - a method of providing multi-pathing and
131125
multi-switch redundancy in layer-2 networks.
132126

133-
Monasca
134-
OpenStack’s monitoring service (“Monitoring as a Service at Scale”).
135-
Logging, telemetry and events from the infrastructure, control plane and
136-
user projects can be submitted and processed by Monasca.
137-
https://docs.openstack.org/monasca-api/latest/
138-
139127
Neutron
140128
OpenStack’s networking service.
141129
https://docs.openstack.org/neutron/latest/

source/operations_and_monitoring.rst

Lines changed: 46 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ Operations and Monitoring
77
Access to Kibana
88
================
99

10-
OpenStack control plane logs are aggregated from all servers by Monasca and
10+
OpenStack control plane logs are aggregated from all servers by Fluentd and
1111
stored in ElasticSearch. The control plane logs can be accessed from
1212
ElasticSearch using Kibana, which is available at the following URL:
1313
|kibana_url|
1414

15-
To login, use the ``kibana`` user. The password is auto-generated by
15+
To log in, use the ``kibana`` user. The password is auto-generated by
1616
Kolla-Ansible and can be extracted from the encrypted passwords file
1717
(|kolla_passwords|):
1818

@@ -24,19 +24,32 @@ Kolla-Ansible and can be extracted from the encrypted passwords file
2424
Access to Grafana
2525
=================
2626

27-
Monasca metrics can be visualised in Grafana dashboards. Monasca Grafana can be
27+
Control plane metrics can be visualised in Grafana dashboards. Grafana can be
2828
found at the following address: |grafana_url|
2929

30-
Grafana uses Keystone authentication. To login, use valid OpenStack user
31-
credentials.
30+
To log in, use the |grafana_username| user. The password is auto-generated by
31+
Kolla-Ansible and can be extracted from the encrypted passwords file
32+
(|kolla_passwords|):
33+
34+
.. code-block:: console
35+
:substitutions:
36+
37+
kayobe# ansible-vault view ${KAYOBE_CONFIG_PATH}/kolla/passwords.yml --vault-password-file |vault_password_file_path| | grep ^grafana_admin_password
38+
39+
Access to Prometheus Alertmanager
40+
=================================
3241

33-
To visualise control plane metrics, you will need one of the following roles in
34-
the ``monasca_control_plane`` project:
42+
Control plane alerts can be visualised and managed in Alertmanager, which can
43+
be found at the following address: |alertmanager_url|
3544

36-
* ``admin``
37-
* ``monasca-user``
38-
* ``monasca-read-only-user``
39-
* ``monasca-editor``
45+
To log in, use the ``admin`` user. The password is auto-generated by
46+
Kolla-Ansible and can be extracted from the encrypted passwords file
47+
(|kolla_passwords|):
48+
49+
.. code-block:: console
50+
:substitutions:
51+
52+
kayobe# ansible-vault view ${KAYOBE_CONFIG_PATH}/kolla/passwords.yml --vault-password-file |vault_password_file_path| | grep ^prometheus_alertmanager_password
4053
4154
Migrating virtual machines
4255
==========================
@@ -246,6 +259,7 @@ Monitoring
246259

247260
* `Back up InfluxDB <https://docs.influxdata.com/influxdb/v1.8/administration/backup_and_restore/>`__
248261
* `Back up ElasticSearch <https://www.elastic.co/guide/en/elasticsearch/reference/current/backup-cluster-data.html>`__
262+
* `Back up Prometheus <https://prometheus.io/docs/prometheus/latest/querying/api/#snapshot>`__
249263

250264
Seed
251265
----
@@ -260,137 +274,21 @@ Ansible control host
260274
Control Plane Monitoring
261275
========================
262276

263-
Monasca has been configured to collect logs and metrics across the control
264-
plane. It provides a single point where control plane monitoring and telemetry
265-
data can be analysed and correlated.
266-
267-
Metrics are collected per server via the `Monasca Agent
268-
<https://opendev.org/openstack/monasca-agent>`__. The Monasca Agent is deployed
269-
and configured by Kolla Ansible.
270-
271-
Logging to Monasca is done via a `Fluentd output plugin
272-
<https://github.com/monasca/fluentd-monasca>`__.
273-
274-
Configuring Monasca Alerts
275-
--------------------------
276-
277-
Generating Metrics from Specific Log Messages
278-
+++++++++++++++++++++++++++++++++++++++++++++
279-
280-
If you wish to generate alerts for specific log messages, you must first
281-
generate metrics from those log messages. Metrics are generated from the
282-
transformed logs queue in Kafka. The Monasca log metrics service reads log
283-
messages from this queue, transforms them into metrics and then writes them to
284-
the metrics queue.
285-
286-
The rules which govern this transformation are defined in the logstash config
287-
file. This file can be configured via kayobe. To do this, edit
288-
``etc/kayobe/kolla/config/monasca/log-metrics.conf``, for example:
289-
290-
.. code-block:: text
291-
292-
# Create events from specific log signatures
293-
filter {
294-
if "Another thread already created a resource provider" in [log][message] {
295-
mutate {
296-
add_field => { "[log][dimensions][event]" => "hat" }
297-
}
298-
} else if "My string here" in [log][message] {
299-
mutate {
300-
add_field => { "[log][dimensions][event]" => "my_new_alert" }
301-
}
302-
}
303-
304-
Reconfigure Monasca:
305-
306-
.. code-block:: text
307-
308-
kayobe# kayobe overcloud service reconfigure --kolla-tags monasca
309-
310-
Verify that logstash doesn't complain about your modification. On each node
311-
running the ``monasca-log-metrics`` service, the logs can be inspected in the
312-
Kolla logs directory, under the ``logstash`` folder:
313-
``/var/log/kolla/logstash``.
314-
315-
Metrics will now be generated from the configured log messages. To generate
316-
alerts/notifications from your new metric, follow the next section.
317-
318-
Generating Monasca Alerts from Metrics
319-
++++++++++++++++++++++++++++++++++++++
277+
The control plane has been configured to collect logs centrally using the EFK
278+
stack (Elasticsearch, Fluentd and Kibana).
320279

321-
Firstly, we will configure alarms and notifications. This should be done via
322-
the Monasca client. More detailed documentation is available in the `Monasca
323-
API specification
324-
<https://github.com/openstack/monasca-api/blob/master/docs/monasca-api-spec.md#alarm-definitions-and-alarms>`__.
325-
This document provides an overview of common use-cases.
280+
Telemetry monitoring of the control plane is performed by Prometheus. Metrics
281+
are collected by Prometheus exporters, which are either running on all hosts
282+
(e.g. node exporter), on specific hosts (e.g. controllers for the memcached
283+
exporter or monitoring hosts for the OpenStack exporter). These exporters are
284+
scraped by the Prometheus server.
326285

327-
To create a Slack notification, first obtain the URL for the notification hook
328-
from Slack, and configure the notification as follows:
286+
Configuring Prometheus Alerts
287+
-----------------------------
329288

330-
.. code-block:: console
331-
332-
monasca# monasca notification-create stackhpc_slack SLACK https://hooks.slack.com/services/UUID
333-
334-
You can view notifications at any time by invoking:
335-
336-
.. code-block:: console
337-
338-
monasca# monasca notification-list
339-
340-
To create an alarm with an associated notification:
341-
342-
.. code-block:: console
343-
344-
monasca# monasca alarm-definition-create multiple_nova_compute \
345-
'(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
346-
--description "Multiple nova compute instances detected" \
347-
--severity HIGH --alarm-actions $NOTIFICATION_ID
348-
349-
By default one alarm will be created for all hosts. This is typically useful
350-
when you are looking at the overall state of some hosts. For example in the
351-
screenshot below the ``db_mon_log_high_mem_usage`` alarm has previously
352-
triggered on a number of hosts, but is currently below threshold.
353-
354-
If you wish to have an alarm created per host you can use the ``--match-by``
355-
option and specify the hostname dimension. For example:
356-
357-
.. code-block:: console
358-
359-
monasca# monasca alarm-definition-create multiple_nova_compute \
360-
'(count(log.event.multiple_nova_compute{}, deterministic)>0)' \
361-
--description "Multiple nova compute instances detected" \
362-
--severity HIGH --alarm-actions $NOTIFICATION_ID
363-
--match-by hostname
364-
365-
Creating an alarm per host can be useful when alerting on one off events such
366-
as log messages which need to be actioned individually. Once the issue has been
367-
investigated and fixed, the alarm can be deleted on a per host basis.
368-
369-
For example, in the case of monitoring for file system corruption one might
370-
define a metric from the system logs alerting on XFS file system corruption, or
371-
ECC memory errors. These metrics may only be generated once, but it is
372-
important that they are not ignored. Therefore, in the example below, the last
373-
operator is used so that the alarm is evaluated against the last metric
374-
associated with the log message. Since for log metrics the value of this metric
375-
is always greater than 0, this alarm can only be reset by deleting it (which
376-
can be accomplished by clicking on the dustbin icon in Monasca Grafana). By
377-
ensuring that the alarm has to be manually deleted and will not reset to the OK
378-
status, important errors can be tracked.
379-
380-
.. code-block:: console
381-
382-
monasca# monasca alarm-definition-create xfs_errors \
383-
'(last(log.event.xfs_errors_detected{}, deterministic)>0)' \
384-
--description "XFS errors detected on host" \
385-
--severity HIGH --alarm-actions $NOTIFICATION_ID \
386-
--match-by hostname
387-
388-
It is also possible to update existing alarms. For example, to update, or add
389-
multiple notifications to an alarm:
390-
391-
.. code-block:: console
392-
393-
monasca# monasca alarm-definition-patch $ALARM_ID --alarm-actions $NOTIFICATION_ID --alarm-actions $NOTIFICATION_ID_2
289+
Alerts are defined in code and stored in Kayobe configuration. See ``*.rules``
290+
files in ``${KAYOBE_CONFIG_PATH}/kolla/config/prometheus`` as a model to add
291+
custom rules.
394292

395293
Control Plane Shutdown Procedure
396294
================================
@@ -683,21 +581,26 @@ perform the following cleanup procedure regularly:
683581
684582
Elasticsearch indexes retention
685583
===============================
686-
To enable and alter default rotation values for Elasticsearch Curator edit ``${KAYOBE_CONFIG_PATH}/kolla/globals.yml`` - This applies both to Monasca and Central Logging configurations.
584+
585+
To enable and alter default rotation values for Elasticsearch Curator, edit
586+
``${KAYOBE_CONFIG_PATH}/kolla/globals.yml``:
687587

688588
.. code-block:: console
689589
690590
# Allow Elasticsearch Curator to apply a retention policy to logs
691591
enable_elasticsearch_curator: true
592+
692593
# Duration after which index is closed
693594
elasticsearch_curator_soft_retention_period_days: 90
595+
694596
# Duration after which index is deleted
695597
elasticsearch_curator_hard_retention_period_days: 180
696598
697-
Reconfigure elasticsearch with new values:
599+
Reconfigure Elasticsearch with new values:
698600

699601
.. code-block:: console
700602
701-
kayobe overcloud service reconfigure --kolla-tags elasticsearch --kolla-skip-tags common --skip-precheck
603+
kayobe overcloud service reconfigure --kolla-tags elasticsearch
702604
703-
For more information see `upstream documentation <https://docs.openstack.org/kolla-ansible/ussuri/reference/logging-and-monitoring/central-logging-guide.html#curator>`__
605+
For more information see the `upstream documentation
606+
<https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/central-logging-guide.html#curator>`__.

source/vars.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
.. |alertmanager_url| replace:: https://openstack.acme.example:9093
12
.. |base_path| replace:: ~/kayobe-env
23
.. |chat_system| replace:: Slack
34
.. |control_host_access| replace:: |control_host| is used as the Ansible control host. Each operator uses their own account on this host, but with a shared SSH key stored as ``~/.ssh/id_rsa``.
@@ -9,6 +10,7 @@
910
.. |flavor_name| replace:: m1.tiny
1011
.. |floating_ip_access| replace:: from acme-seed-hypervisor and the rest of the Acme network
1112
.. |grafana_url| replace:: https://openstack.acme.example:3000
13+
.. |grafana_username| replace:: ``grafana_local_admin``
1214
.. |horizon_access| replace:: via the Internet.
1315
.. |horizon_theme_clone_url| replace:: https://github.com/acme-openstack/horizon-theme.git
1416
.. |horizon_theme_name| replace:: acme

0 commit comments

Comments
 (0)