Merge pull request #40704 from Amrita42/HPA-2688

lpettyjo · web-flow · commit 8b0bdda27a82 · 2022-02-21T12:35:30.000-05:00
OSDOCS-2688: Conceptual information about HPA
diff --git a/images/HPAflow.png b/images/HPAflow.png
diff --git a/modules/nodes-pods-autoscaling-best-practices-hpa.adoc b/modules/nodes-pods-autoscaling-best-practices-hpa.adoc
@@ -0,0 +1,27 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-pods-autoscaling-about.adoc
+
+:_content-type: CONCEPT
+[id="nodes-pods-autoscaling-best-practices-hpa_{context}"]
+= Best practices
+
+.All pods must have resource requests configured
+The HPA makes a scaling decision based on the observed CPU or memory utilization values of pods in an {product-title} cluster. Utilization values are calculated as a percentage of the resource requests of each pod.
+Missing resource request values can affect the optimal performance of the HPA.
+
+.Configure the cool down period
+During horizontal pod autoscaling, there might be a rapid scaling of events without a time gap. Configure the cool down period to prevent frequent replica fluctuations.
+You can specify a cool down period by configuring the `stabilizationWindowSeconds` field. The stabilization window is used to restrict the the fluctuation of replicas count when the metrics used for scaling keep fluctuating.
+The autoscaling algorithm uses this window to infer a previous desired state and avoid unwanted changes to workload scale.
+
+For example, a stabilization window is specified for the `scaleDown` field:
+
+[source,yaml]
+----
+behavior:
+  scaleDown:
+    stabilizationWindowSeconds: 300
+----
+
+In the above example, all desired states for the past 5 minutes are considered. This approximates a rolling maximum, and avoids having the scaling algorithm frequently remove pods only to trigger recreating an equivalent pod just moments later.
diff --git a/modules/nodes-pods-autoscaling-requests-and-limits-hpa.adoc b/modules/nodes-pods-autoscaling-requests-and-limits-hpa.adoc
@@ -0,0 +1,28 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-pods-autoscaling-about.adoc
+
+:_content-type: CONCEPT
+[id="nodes-pods-autoscaling-requests-and-limits-hpa_{context}"]
+= About requests and limits
+
+The scheduler uses the resource request that you specify for containers in a pod, to decide which node to place the pod on. The kubelet enforces the resource limit that you specify for a container to ensure that the container is not allowed to use more than the specified limit.
+The kubelet also reserves the request amount of that system resource specifically for that container to use.
+
+.How to use resource metrics?
+
+In the pod specifications, you must specify the resource requests, such as CPU and memory. The HPA uses this specification to determine the resource utilization and then scales the target up or down.
+
+For example, the HPA object uses the following metric source:
+
+[source,yaml]
+----
+type: Resource
+resource:
+  name: cpu
+  target:
+    type: Utilization
+    averageUtilization: 60
+----
+
+In this example, the HPA keeps the average utilization of the pods in the scaling target at 60%. Utilization is the ratio between the current resource usage to the requested resource of the pod.
diff --git a/modules/nodes-pods-autoscaling-workflow-hpa.adoc b/modules/nodes-pods-autoscaling-workflow-hpa.adoc
@@ -0,0 +1,23 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-pods-autoscaling-about.adoc
+
+:_content-type: CONCEPT
+[id="nodes-pods-autoscaling-workflow-hpa_{context}"]
+= How does the HPA work?
+
+The horizontal pod autoscaler (HPA) extends the concept of pod auto-scaling. The HPA lets you create and manage a group of load-balanced nodes. The HPA automatically increases or decreases the number of pods when a given CPU or memory threshold is crossed.
+
+.High level workflow of the HPA
+image::HPAflow.png[workflow]
+
+The HPA is an API resource in the Kubernetes autoscaling API group. The autoscaler works as a control loop with a default of 15 seconds for the sync period. During this period, the controller manager queries the CPU, memory utilization, or both, against what is defined in the YAML file for the HPA.
+The controller manager obtains the utilization metrics from the resource metrics API for per-pod resource metrics like CPU or memory, for each pod that is targeted by the HPA.
+
+If a utilization value target is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each pod. The controller then takes the average of utilization across all targeted pods and produces a ratio that is used to scale the number of desired replicas.
+The HPA is configured to fetch metrics from `metrics.k8s.io`, which is provided by the metrics server. Because of the dynamic nature of metrics evaluation, the number of replicas can fluctuate during scaling for a group of replicas.
+
+[NOTE]
+====
+To implement the HPA, all targeted pods must have a resource request set on their containers.
+====
diff --git a/nodes/pods/nodes-pods-autoscaling.adoc b/nodes/pods/nodes-pods-autoscaling.adoc
@@ -10,7 +10,7 @@ As a developer, you can use a horizontal pod autoscaler (HPA) to
 specify how {product-title} should automatically increase or decrease the scale of
 a replication controller or deployment configuration, based on metrics collected
 from the pods that belong to that replication controller or deployment
-configuration. You can create an HPA for any `Deployment`, `DeploymentConfig`, 
+configuration. You can create an HPA for any `Deployment`, `DeploymentConfig`,
 `ReplicaSet`, `ReplicationController`, or `StatefulSet` object.
 
 [NOTE]
@@ -26,6 +26,12 @@ these objects, see xref:../../applications/deployments/what-deployments-are.adoc
 
 include::modules/nodes-pods-autoscaling-about.adoc[leveloffset=+1]
 
+include::modules/nodes-pods-autoscaling-workflow-hpa.adoc[leveloffset=+1]
+
+include::modules/nodes-pods-autoscaling-requests-and-limits-hpa.adoc[leveloffset=+1]
+
+include::modules/nodes-pods-autoscaling-best-practices-hpa.adoc[leveloffset=+1]
+
 include::modules/nodes-pods-autoscaling-policies.adoc[leveloffset=+2]
 
 include::modules/nodes-pods-autoscaling-creating-web-console.adoc[leveloffset=+1]
@@ -41,5 +47,7 @@ include::modules/nodes-pods-autoscaling-status-viewing.adoc[leveloffset=+2]
 
 == Additional resources
 
-For more information on replication controllers and deployment controllers,
+* For more information on replication controllers and deployment controllers,
 see xref:../../applications/deployments/what-deployments-are.adoc#what-deployments-are[Understanding deployments and deployment configs].
+
+* For an example on the usage of HPA, see https://cloud.redhat.com/blog/horizontal-pod-autoscaling-of-quarkus-application-based-on-memory-utilization[Horizontal Pod Autoscaling of Quarkus Application Based on Memory Utilization].