Skip to content

Commit 6c43dfd

Browse files
Merge pull request #80582 from eromanova97/OBSDOCS-271
OBSDOCS-271: Highlight that some monitoring components are HA in the …
2 parents bc8bf65 + 9ee24dd commit 6c43dfd

File tree

4 files changed

+53
-2
lines changed

4 files changed

+53
-2
lines changed

modules/monitoring-configuring-persistent-storage.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,11 @@ Run cluster monitoring with persistent storage to gain the following benefits:
1313
1414
For production environments, it is highly recommended to configure persistent storage.
1515

16+
[IMPORTANT]
17+
====
18+
In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability.
19+
====
20+
1621
[id="persistent-storage-prerequisites_{context}"]
1722
== Persistent storage prerequisites
1823

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
// Module included in the following assembly:
2+
//
3+
// * observability/monitoring/monitoring-overview.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="understanding-monitoring-stack-in-ha-clusters_{context}"]
7+
= Understanding the monitoring stack in high-availability clusters
8+
9+
By default, in multi-node clusters, the following components run in high-availability (HA) mode to prevent data loss and service interruption:
10+
11+
* Prometheus
12+
* Alertmanager
13+
* Thanos Ruler
14+
ifndef::openshift-dedicated,openshift-rosa[]
15+
* Thanos Querier
16+
* Metrics Server
17+
* Monitoring plugin
18+
endif::openshift-dedicated,openshift-rosa[]
19+
20+
The component is replicated across two pods, each running on a separate node. This means that the monitoring stack can tolerate the loss of one pod.
21+
22+
Prometheus in HA mode::
23+
24+
* Both replicas independently scrape the same targets and evaluate the same rules.
25+
* The replicas do not communicate with each other. Therefore, data might differ between the pods.
26+
27+
Alertmanager in HA mode::
28+
29+
* The two replicas synchronize notification and silence states with each other. This ensures that each notification is sent at least once.
30+
* If the replicas fail to communicate or if there is an issue on the receiving side, notifications are still sent, but they might be duplicated.
31+
32+
[IMPORTANT]
33+
====
34+
Prometheus, Alertmanager, and Thanos Ruler are stateful components. To ensure high availability, you must configure them with persistent storage.
35+
====
36+
37+

observability/monitoring/common-monitoring-configuration-scenarios.adoc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,11 @@ Any other configuration options listed here are optional.
3232
* For shorter term data retention, xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring-persistent-storage_configuring-the-monitoring-stack[configure persistent storage] for Prometheus and Alertmanager to store metrics and alert data.
3333
Specify the metrics data retention parameters for Prometheus and Thanos Ruler.
3434
+
35-
[NOTE]
35+
[IMPORTANT]
3636
====
37-
By default, in a newly installed {product-title} system, the monitoring `ClusterOperator` resource reports a `PrometheusDataPersistenceNotConfigured` status message to remind you that storage is not configured.
37+
* In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability.
38+
39+
* By default, in a newly installed {product-title} system, the monitoring `ClusterOperator` resource reports a `PrometheusDataPersistenceNotConfigured` status message to remind you that storage is not configured.
3840
====
3941
+
4042
* For longer term data retention, xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring_remote_write_storage_configuring-the-monitoring-stack[configure the remote write feature] to enable Prometheus to send ingested metrics to remote systems for storage.

observability/monitoring/monitoring-overview.adoc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@ include::modules/monitoring-default-monitoring-targets.adoc[leveloffset=+2]
3939

4040
include::modules/monitoring-components-for-monitoring-user-defined-projects.adoc[leveloffset=+2]
4141
include::modules/monitoring-targets-for-user-defined-projects.adoc[leveloffset=+2]
42+
include::modules/monitoring-understanding-monitoring-stack-in-ha-clusters.adoc[leveloffset=+2]
43+
[role="_additional-resources"]
44+
.Additional resources
45+
* xref:../../operators/operator_sdk/osdk-ha-sno.adoc#osdk-ha-sno[High-availability or single-node cluster detection and support]
46+
* xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring-persistent-storage_configuring-the-monitoring-stack[Configuring persistent storage]
47+
* xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring-the-monitoring-stack_configuring-the-monitoring-stack[Configuring the monitoring stack]
48+
4249
include::modules/monitoring-common-terms.adoc[leveloffset=+1]
4350

4451
ifndef::openshift-dedicated,openshift-rosa[]

0 commit comments

Comments
 (0)