Merge pull request #41348 from jeana-redhat/OSDOCS-3141_cluster_autoscaler_scale_down_threshold

jeana-redhat · web-flow · commit 2d58a56a4432 · 2022-02-07T14:33:44.000-05:00
OSDOCS-3141: Cluster autoscaler node utilization threshold in 4.10
diff --git a/machine_management/applying-autoscaling.adoc b/machine_management/applying-autoscaling.adoc
@@ -15,8 +15,6 @@ You can configure the cluster autoscaler only in clusters where the machine API
 
 include::modules/cluster-autoscaler-about.adoc[leveloffset=+1]
 
-include::modules/machine-autoscaler-about.adoc[leveloffset=+1]
-
 [id="configuring-clusterautoscaler"]
 == Configuring the cluster autoscaler
 
@@ -37,6 +35,8 @@ include::modules/deploying-resource.adoc[leveloffset=+2]
 
 * After you configure the cluster autoscaler, you must configure at least one machine autoscaler.
 
+include::modules/machine-autoscaler-about.adoc[leveloffset=+1]
+
 [id="configuring-machineautoscaler"]
 == Configuring the machine autoscalers
 
diff --git a/modules/cluster-autoscaler-about.adoc b/modules/cluster-autoscaler-about.adoc
@@ -20,7 +20,7 @@ Ensure that the `maxNodesTotal` value in the `ClusterAutoscaler` resource defini
 
 Every 10 seconds, the cluster autoscaler checks which nodes are unnecessary in the cluster and removes them. The cluster autoscaler considers a node for removal if the following conditions apply:
 
-* The sum of CPU and memory requests of all pods running on the node is less than 50% of the allocated resources on the node.
+* The node utilization is less than the _node utilization level_ threshold for the cluster. The node utilization level is the sum of the requested resources divided by the allocated resources for the node. If you do not specify a value in the `ClusterAutoscaler` custom resource, the cluster autoscaler uses a default value of `0.5`, which corresponds to 50% utilization.
 * The cluster autoscaler can move all pods running on the node to the other nodes.
 * The cluster autoscaler does not have scale down disabled annotation.
 
diff --git a/modules/cluster-autoscaler-cr.adoc b/modules/cluster-autoscaler-cr.adoc
@@ -3,6 +3,7 @@
 // * machine_management/applying-autoscaling.adoc
 // * post_installation_configuration/cluster-tasks.adoc
 
+:_content-type: REFERENCE
 [id="cluster-autoscaler-cr_{context}"]
 = ClusterAutoscaler resource definition
 
@@ -38,26 +39,29 @@ spec:
     delayAfterDelete: 5m <13>
     delayAfterFailure: 30s <14>
     unneededTime: 5m <15>
+    utilizationThreshold: 0.4 <16>
 ----
 <1> Specify the priority that a pod must exceed to cause the cluster autoscaler to deploy additional nodes. Enter a 32-bit integer value. The `podPriorityThreshold` value is compared to the value of the `PriorityClass` that you assign to each pod.
 <2> Specify the maximum number of nodes to deploy. This value is the total number of machines that are deployed in your cluster, not just the ones that the autoscaler controls. Ensure that this value is large enough to account for all of your control plane and compute machines and the total number of replicas that you specify in your `MachineAutoscaler` resources.
 <3> Specify the minimum number of cores to deploy in the cluster.
 <4> Specify the maximum number of cores to deploy in the cluster.
 <5> Specify the minimum amount of memory, in GiB, in the cluster.
 <6> Specify the maximum amount of memory, in GiB, in the cluster.
-<7> Optionally, specify the type of GPU node to deploy. Only `nvidia.com/gpu` and `amd.com/gpu` are valid types.
+<7> Optional: Specify the type of GPU node to deploy. Only `nvidia.com/gpu` and `amd.com/gpu` are valid types.
 <8> Specify the minimum number of GPUs to deploy in the cluster.
 <9> Specify the maximum number of GPUs to deploy in the cluster.
 <10> In this section, you can specify the period to wait for each action by using any valid link:https://golang.org/pkg/time/#ParseDuration[ParseDuration] interval, including `ns`, `us`, `ms`, `s`, `m`, and `h`.
 <11> Specify whether the cluster autoscaler can remove unnecessary nodes.
-<12> Optionally, specify the period to wait before deleting a node after a node has recently been _added_. If you do not specify a value, the default value of `10m` is used.
-<13> Specify the period to wait before deleting a node after a node has recently been _deleted_. If you do not specify a value, the default value of `10s` is used.
-<14> Specify the period to wait before deleting a node after a scale down failure occurred. If you do not specify a value, the default value of `3m` is used.
-<15> Specify the period before an unnecessary node is eligible for deletion. If you do not specify a value, the default value of `10m` is used.
+<12> Optional: Specify the period to wait before deleting a node after a node has recently been _added_. If you do not specify a value, the default value of `10m` is used.
+<13> Optional: Specify the period to wait before deleting a node after a node has recently been _deleted_. If you do not specify a value, the default value of `0s` is used.
+<14> Optional: Specify the period to wait before deleting a node after a scale down failure occurred. If you do not specify a value, the default value of `3m` is used.
+<15> Optional: Specify the period before an unnecessary node is eligible for deletion. If you do not specify a value, the default value of `10m` is used.
+<16> Optional: Specify the _node utilization level_ below which an unnecessary node is eligible for deletion. The node utilization level is the sum of the requested resources divided by the allocated resources for the node, and must be a value greater than `0` but less than `1`. If you do not specify a value, the cluster autoscaler uses a default value of `0.5`, which corresponds to 50% utilization.
+// Might be able to add a formula to show this visually, but need to look into asciidoc math formatting and what our tooling supports.
 
 [NOTE]
 ====
 When performing a scaling operation, the cluster autoscaler remains within the ranges set in the `ClusterAutoscaler` resource definition, such as the minimum and maximum number of cores to deploy or the amount of memory in the cluster. However, the cluster autoscaler does not correct the current values in your cluster to be within those ranges.
 
-The minimum and maximum CPUs, memory, and GPU values are determined by calculating those resources on all nodes in the cluster, even if the cluster autoscaler does not manage the nodes. For example, the control plane nodes are considered in the total memory in the cluster, even though the cluster autoscaler does not manage the control plane nodes.  
+The minimum and maximum CPUs, memory, and GPU values are determined by calculating those resources on all nodes in the cluster, even if the cluster autoscaler does not manage the nodes. For example, the control plane nodes are considered in the total memory in the cluster, even though the cluster autoscaler does not manage the control plane nodes.
 ====