Skip to content

Commit b9cf798

Browse files
committed
Document how to silence Prometheus alerts
1 parent 348e154 commit b9cf798

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

source/operations_and_monitoring.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ Kolla-Ansible and can be extracted from the encrypted passwords file
3636
3737
kayobe# ansible-vault view ${KAYOBE_CONFIG_PATH}/kolla/passwords.yml --vault-password-file |vault_password_file_path| | grep ^grafana_admin_password
3838
39+
.. _prometheus-alertmanager:
40+
3941
Access to Prometheus Alertmanager
4042
=================================
4143

@@ -290,6 +292,32 @@ Alerts are defined in code and stored in Kayobe configuration. See ``*.rules``
290292
files in ``${KAYOBE_CONFIG_PATH}/kolla/config/prometheus`` as a model to add
291293
custom rules.
292294

295+
Silencing Prometheus Alerts
296+
---------------------------
297+
298+
Sometimes alerts must be silenced because the root cause cannot be resolved
299+
right away, such as when hardware is faulty. For example, an unreachable
300+
hypervisor will produce several alerts:
301+
302+
* ``InstanceDown`` from Node Exporter
303+
* ``OpenStackServiceDown`` from the OpenStack exporter, which reports status of
304+
the ``nova-compute`` agent on the host
305+
* ``PrometheusTargetMissing`` from several Prometheus exporters
306+
307+
Rather than silencing each alert one by one for a specific host, a silence can
308+
apply to multiple alerts using a reduced list of labels. :ref:`Log into
309+
Alertmanager <prometheus-alertmanager>`, click on the ``Silence`` button next
310+
to an alert and adjust the matcher list to keep only ``instance=<hostname>``
311+
label. Then, create another silence to match ``hostname=<hostname>`` (this is
312+
required because, for the OpenStack exporter, the instance is the host running
313+
the monitoring service rather than the host being monitored).
314+
315+
.. note::
316+
317+
After creating the silence, you may get redirected to a 404 page. This is a
318+
`known issue <https://github.com/prometheus/alertmanager/issues/1377>`__
319+
when running several Alertmanager instances behind HAProxy.
320+
293321
Control Plane Shutdown Procedure
294322
================================
295323

0 commit comments

Comments
 (0)