Merge pull request #29668 from lbarbeevargas/OSDOCS-1818-upgrades-PVC-backed-Prometheus

lbarbeevargas · web-flow · commit 62ce8280e596 · 2021-03-08T18:30:35.000Z
OSDOCS-1818 for MON-1520 Resource usage of PVC backed Prometheus during upgrades
diff --git a/monitoring/configuring-the-monitoring-stack.adoc b/monitoring/configuring-the-monitoring-stack.adoc
@@ -71,6 +71,11 @@ Running cluster monitoring with persistent storage means that your metrics are s
 
 [IMPORTANT]
 ====
+If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see link:https://bugzilla.redhat.com/show_bug.cgi?id=1925061[BZ#1925061].
+====
+
+[NOTE]
+====
 See xref:../scalability_and_performance/optimizing-storage.adoc#recommended-configurable-storage-technology_persistent-storage[Recommended configurable storage technology].
 ====
 
diff --git a/scalability_and_performance/scaling-cluster-monitoring-operator.adoc b/scalability_and_performance/scaling-cluster-monitoring-operator.adoc
@@ -5,10 +5,12 @@ include::modules/common-attributes.adoc[]
 
 toc::[]
 
-{product-title} exposes metrics that the Cluster Monitoring Operator collects
-and stores in the Prometheus-based monitoring stack. As an 
-administrator, you can view system resources, containers and components metrics
-in one dashboard interface, Grafana.
+{product-title} exposes metrics that the Cluster Monitoring Operator collects and stores in the Prometheus-based monitoring stack. As an administrator, you can view system resources, containers and components metrics in one dashboard interface, Grafana.
+
+[IMPORTANT]
+====
+If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see link:https://bugzilla.redhat.com/show_bug.cgi?id=1925061[BZ#1925061].
+====
 
 include::modules/prometheus-database-storage-requirements.adoc[leveloffset=+1]
 
diff --git a/updating/updating-cluster-between-minor.adoc b/updating/updating-cluster-between-minor.adoc
@@ -25,6 +25,11 @@ See xref:../authentication/using-rbac.adoc[Using RBAC to define and apply permis
 Using the `unsupportedConfigOverrides` section to modify the configuration of an Operator is unsupported and might block cluster upgrades. You must remove this setting before you can upgrade your cluster.
 ====
 
+[IMPORTANT]
+====
+If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see link:https://bugzilla.redhat.com/show_bug.cgi?id=1925061[BZ#1925061].
+====
+
 include::modules/update-service-overview.adoc[leveloffset=+1]
 .Additional resources
 
diff --git a/updating/updating-cluster-cli.adoc b/updating/updating-cluster-cli.adoc
@@ -20,6 +20,11 @@ See xref:../authentication/using-rbac.adoc[Using RBAC to define and apply permis
 Using the `unsupportedConfigOverrides` section to modify the configuration of an Operator is unsupported and might block cluster upgrades. You must remove this setting before you can upgrade your cluster.
 ====
 
+[IMPORTANT]
+====
+If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see link:https://bugzilla.redhat.com/show_bug.cgi?id=1925061[BZ#1925061].
+====
+
 include::modules/update-service-overview.adoc[leveloffset=+1]
 .Additional resources
 
diff --git a/updating/updating-cluster-rhel-compute.adoc b/updating/updating-cluster-rhel-compute.adoc
@@ -16,6 +16,11 @@ See xref:../authentication/using-rbac.adoc[Using RBAC to define and apply permis
 * Have a recent xref:../backup_and_restore/backing-up-etcd.adoc#backup-etcd[etcd backup] in case your upgrade fails and you must xref:../backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[restore your cluster to a previous state].
 * If your cluster uses manually maintained credentials, ensure that the Cloud Credential Operator (CCO) is in an upgradeable state. For more information, see _Upgrading clusters with manually maintained credentials_ for xref:../installing/installing_aws/manually-creating-iam.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-aws[AWS], xref:../installing/installing_azure/manually-creating-iam-azure.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-azure[Azure], or xref:../installing/installing_gcp/manually-creating-iam-gcp.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-gcp[GCP].
 
+[IMPORTANT]
+====
+If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see link:https://bugzilla.redhat.com/show_bug.cgi?id=1925061[BZ#1925061].
+====
+
 include::modules/update-service-overview.adoc[leveloffset=+1]
 .Additional resources
 
diff --git a/updating/updating-cluster.adoc b/updating/updating-cluster.adoc
@@ -14,6 +14,11 @@ See xref:../authentication/using-rbac.adoc[Using RBAC to define and apply permis
 * Have a recent xref:../backup_and_restore/backing-up-etcd.adoc#backup-etcd[etcd backup] in case your upgrade fails and you must xref:../backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[restore your cluster to a previous state].
 * If your cluster uses manually maintained credentials, ensure that the Cloud Credential Operator (CCO) is in an upgradeable state. For more information, see _Upgrading clusters with manually maintained credentials_ for xref:../installing/installing_aws/manually-creating-iam.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-aws[AWS], xref:../installing/installing_azure/manually-creating-iam-azure.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-azure[Azure], or xref:../installing/installing_gcp/manually-creating-iam-gcp.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-gcp[GCP].
 
+[IMPORTANT]
+====
+If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see link:https://bugzilla.redhat.com/show_bug.cgi?id=1925061[BZ#1925061].
+====
+
 include::modules/update-service-overview.adoc[leveloffset=+1]
 .Additional resources
 
diff --git a/updating/updating-restricted-network-cluster.adoc b/updating/updating-restricted-network-cluster.adoc
@@ -21,6 +21,11 @@ See xref:../authentication/using-rbac.adoc[Using RBAC to define and apply permis
 * Have a recent xref:../backup_and_restore/backing-up-etcd.adoc#backup-etcd[etcd backup] in case your upgrade fails and you must xref:../backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-restoring-cluster-state[restore your cluster to a previous state].
 * If your cluster uses manually maintained credentials, ensure that the Cloud Credential Operator (CCO) is in an upgradeable state. For more information, see _Upgrading clusters with manually maintained credentials_ for xref:../installing/installing_aws/manually-creating-iam.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-aws[AWS], xref:../installing/installing_azure/manually-creating-iam-azure.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-azure[Azure], or xref:../installing/installing_gcp/manually-creating-iam-gcp.adoc#manually-maintained-credentials-upgrade_manually-creating-iam-gcp[GCP].
 
+[IMPORTANT]
+====
+If you are running cluster monitoring with an attached PVC for Prometheus, you might experience OOM kills during cluster upgrade. When persistent storage is in use for Prometheus, Prometheus memory usage doubles during cluster upgrade and for several hours after upgrade is complete. To avoid the OOM kill issue, allow worker nodes with double the size of memory that was available prior to the upgrade. For example, if you are running monitoring on the minimum recommended nodes, which is 2 cores with 8 GB of RAM, increase memory to 16 GB. For more information, see link:https://bugzilla.redhat.com/show_bug.cgi?id=1925061[BZ#1925061].
+====
+
 [id="updating-restricted-network-mirror-host"]
 == Preparing your mirror host