Merge pull request #80582 from eromanova97/OBSDOCS-271

michaelryanpeter · web-flow · commit 6c43dfd7026c · 2024-08-26T12:20:20.000-04:00
OBSDOCS-271: Highlight that some monitoring components are HA in the …
diff --git a/modules/monitoring-configuring-persistent-storage.adoc b/modules/monitoring-configuring-persistent-storage.adoc
@@ -13,6 +13,11 @@ Run cluster monitoring with persistent storage to gain the following benefits:
 
 For production environments, it is highly recommended to configure persistent storage. 
 
+[IMPORTANT]
+====
+In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability.
+====
+
 [id="persistent-storage-prerequisites_{context}"]
 == Persistent storage prerequisites
 
diff --git a/modules/monitoring-understanding-monitoring-stack-in-ha-clusters.adoc b/modules/monitoring-understanding-monitoring-stack-in-ha-clusters.adoc
@@ -0,0 +1,37 @@
+// Module included in the following assembly:
+//
+// * observability/monitoring/monitoring-overview.adoc
+
+:_mod-docs-content-type: CONCEPT
+[id="understanding-monitoring-stack-in-ha-clusters_{context}"]
+= Understanding the monitoring stack in high-availability clusters
+
+By default, in multi-node clusters, the following components run in high-availability (HA) mode to prevent data loss and service interruption:
+
+* Prometheus
+* Alertmanager
+* Thanos Ruler
+ifndef::openshift-dedicated,openshift-rosa[]
+* Thanos Querier
+* Metrics Server
+* Monitoring plugin
+endif::openshift-dedicated,openshift-rosa[]
+
+The component is replicated across two pods, each running on a separate node. This means that the monitoring stack can tolerate the loss of one pod.
+
+Prometheus in HA mode::
+
+* Both replicas independently scrape the same targets and evaluate the same rules.
+* The replicas do not communicate with each other. Therefore, data might differ between the pods. 
+
+Alertmanager in HA mode::
+
+* The two replicas synchronize notification and silence states with each other. This ensures that each notification is sent at least once.
+* If the replicas fail to communicate or if there is an issue on the receiving side, notifications are still sent, but they might be duplicated.
+
+[IMPORTANT]
+====
+Prometheus, Alertmanager, and Thanos Ruler are stateful components. To ensure high availability, you must configure them with persistent storage.
+====
+
+
diff --git a/observability/monitoring/common-monitoring-configuration-scenarios.adoc b/observability/monitoring/common-monitoring-configuration-scenarios.adoc
@@ -32,9 +32,11 @@ Any other configuration options listed here are optional.
 * For shorter term data retention, xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring-persistent-storage_configuring-the-monitoring-stack[configure persistent storage] for Prometheus and Alertmanager to store metrics and alert data.
 Specify the metrics data retention parameters for Prometheus and Thanos Ruler.
 +
-[NOTE]
+[IMPORTANT]
 ====
-By default, in a newly installed {product-title} system, the monitoring `ClusterOperator` resource reports a `PrometheusDataPersistenceNotConfigured` status message to remind you that storage is not configured.
+* In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability.
+
+* By default, in a newly installed {product-title} system, the monitoring `ClusterOperator` resource reports a `PrometheusDataPersistenceNotConfigured` status message to remind you that storage is not configured.
 ====
 +
 * For longer term data retention, xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring_remote_write_storage_configuring-the-monitoring-stack[configure the remote write feature] to enable Prometheus to send ingested metrics to remote systems for storage.
diff --git a/observability/monitoring/monitoring-overview.adoc b/observability/monitoring/monitoring-overview.adoc
@@ -39,6 +39,13 @@ include::modules/monitoring-default-monitoring-targets.adoc[leveloffset=+2]
 
 include::modules/monitoring-components-for-monitoring-user-defined-projects.adoc[leveloffset=+2]
 include::modules/monitoring-targets-for-user-defined-projects.adoc[leveloffset=+2]
+include::modules/monitoring-understanding-monitoring-stack-in-ha-clusters.adoc[leveloffset=+2]
+[role="_additional-resources"]
+.Additional resources
+* xref:../../operators/operator_sdk/osdk-ha-sno.adoc#osdk-ha-sno[High-availability or single-node cluster detection and support]
+* xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring-persistent-storage_configuring-the-monitoring-stack[Configuring persistent storage]
+* xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring-the-monitoring-stack_configuring-the-monitoring-stack[Configuring the monitoring stack]
+
 include::modules/monitoring-common-terms.adoc[leveloffset=+1]
 
 ifndef::openshift-dedicated,openshift-rosa[]

Original file line number	Diff line number	Diff line change
`@@ -32,9 +32,11 @@ Any other configuration options listed here are optional.`
`32`	`32`	`* For shorter term data retention, xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring-persistent-storage_configuring-the-monitoring-stack[configure persistent storage] for Prometheus and Alertmanager to store metrics and alert data.`
`33`	`33`	`Specify the metrics data retention parameters for Prometheus and Thanos Ruler.`
`34`	`34`	`+`
`35`		`-[NOTE]`
	`35`	`+[IMPORTANT]`
`36`	`36`	`====`
`37`		-By default, in a newly installed {product-title} system, the monitoring `ClusterOperator` resource reports a `PrometheusDataPersistenceNotConfigured` status message to remind you that storage is not configured.
	`37`	`+* In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability.`
	`38`	`+`
	`39`	+* By default, in a newly installed {product-title} system, the monitoring `ClusterOperator` resource reports a `PrometheusDataPersistenceNotConfigured` status message to remind you that storage is not configured.
`38`	`40`	`====`
`39`	`41`	`+`
`40`	`42`	`* For longer term data retention, xref:../../observability/monitoring/configuring-the-monitoring-stack.adoc#configuring_remote_write_storage_configuring-the-monitoring-stack[configure the remote write feature] to enable Prometheus to send ingested metrics to remote systems for storage.`