Merge pull request #38800 from abrennan89/targetutil

abrennan89 · web-flow · commit 053ccd50e7ea · 2021-12-01T09:07:50.000-06:00
SRVKS-573: Add target utilization docs
diff --git a/modules/serverless-autoscaling-maxscale-kn.adoc b/modules/serverless-autoscaling-maxscale-kn.adoc
@@ -1,11 +1,15 @@
+// Module is included in the following assemblies:
+//
+// * serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc
+
 [id="serverless-autoscaling-maxscale-kn_{context}"]
 = Setting the maxScale annotation by using the Knative CLI
 
 You can use the `kn service` command with the `--max-scale` flag to create or modify the `--max-scale` value for a service.
 
 .Procedure
 
-* Set the maximum number of pods for the service by using the `--max-scale` flag:
+* Set the maximum number of replicas for the service by using the `--max-scale` flag:
 +
 [source,terminal]
 ----
diff --git a/modules/serverless-autoscaling-minscale-kn.adoc b/modules/serverless-autoscaling-minscale-kn.adoc
@@ -1,11 +1,15 @@
+// Module is included in the following assemblies:
+//
+// * serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc
+
 [id="serverless-autoscaling-minscale_{context}"]
 = Setting the minScale annotation by using the Knative CLI
 
 You can use the `kn service` command with the `--min-scale` flag to create or modify the `--min-scale` value for a service.
 
 .Procedure
 
-* Set the maximum number of pods for the service by using the `--min-scale` flag:
+* Set the minimum number of replicas for the service by using the `--min-scale` flag:
 +
 .Examples
 [source,terminal]
diff --git a/modules/serverless-concurrency-limits-configure-hard.adoc b/modules/serverless-concurrency-limits-configure-hard.adoc
@@ -1,3 +1,7 @@
+// Module included in the following assemblies:
+//
+// * serverless/autoscaling/serverless-autoscaling-concurrency.adoc
+
 [id="serverless-concurrency-limits-configure-hard_{context}"]
 = Configuring a hard concurrency limit
 
@@ -24,9 +28,9 @@ spec:
       containerConcurrency: 50
 ----
 +
-The default value is `0`, which means that there is no limit on the number of requests that are permitted to flow into one pod of the service at a time.
+The default value is `0`, which means that there is no limit on the number of simultaneous requests that are permitted to flow into one replica of the service at a time.
 +
-A value greater than `0` specifies the exact number of requests that are permitted to flow into one pod of the service at a time. This example would enable a hard concurrency limit of 50 requests at a time.
+A value greater than `0` specifies the exact number of requests that are permitted to flow into one replica of the service at a time. This example would enable a hard concurrency limit of 50 requests.
 
 * Optional: Use the `kn service` command to specify the `--concurrency-limit` flag:
 +
diff --git a/modules/serverless-concurrency-limits-configure-soft.adoc b/modules/serverless-concurrency-limits-configure-soft.adoc
@@ -1,3 +1,7 @@
+// Module included in the following assemblies:
+//
+// * serverless/autoscaling/serverless-autoscaling-concurrency.adoc
+
 [id="serverless-concurrency-limits-configure-soft_{context}"]
 = Configuring a soft concurrency target
 
diff --git a/modules/serverless-target-utilization.adoc b/modules/serverless-target-utilization.adoc
@@ -0,0 +1,26 @@
+// Module included in the following assemblies:
+//
+// * /serverless/autoscaling/serverless-autoscaling-concurrency.adoc
+
+[id="serverless-target-utilization_{context}"]
+= Concurrency target utilization
+
+This value specifies the percentage of the concurrency limit that is actually targeted by the autoscaler. This is also known as specifying the _hotness_ at which a replica runs, which enables the autoscaler to scale up before the defined hard limit is reached.
+
+For example, if the `containerConcurrency` annotation value is set to 10, and the `targetUtilizationPercentage` value is set to 70 percent, the autoscaler creates a new replica when the average number of concurrent requests across all existing replicas reaches 7. Requests numbered 7 to 10 are still sent to the existing replicas, but additional replicas are started in anticipation of being required after the `containerConcurrency` annotation limit is reached.
+
+.Example service configured using the targetUtilizationPercentage annotation
+[source,yaml]
+----
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: example-service
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/targetUtilizationPercentage: "70"
+...
+----
diff --git a/serverless/autoscaling/modules b/serverless/autoscaling/modules
@@ -0,0 +1 @@
+../../modules
diff --git a/serverless/autoscaling/serverless-autoscaling-concurrency.adoc b/serverless/autoscaling/serverless-autoscaling-concurrency.adoc
@@ -6,8 +6,9 @@ include::modules/serverless-document-attributes.adoc[]
 
 toc::[]
 
-Concurrency determines the number of simultaneous requests that can be processed by each pod of an application at any given time.
+Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.
 
 include::modules/serverless-concurrency-limits.adoc[leveloffset=+1]
-include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+2]
-include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+2]
+include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+1]
+include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+1]
+include::modules/serverless-target-utilization.adoc[leveloffset=+1]
diff --git a/serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc b/serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc
@@ -6,16 +6,16 @@ include::modules/serverless-document-attributes.adoc[]
 
 toc::[]
 
-Scale bounds determine the minimum and maximum numbers of pods that can serve an application at any given time.
+Scale bounds determine the minimum and maximum numbers of replicas that can serve an application at any given time.
 
 You can set scale bounds for an application to help prevent cold starts or control computing costs.
 
 [id="serverless-autoscaling-minscale"]
 == Minimum scale bounds
 
-The minimum number of pods that can serve an application is determined by the `minScale` annotation.
+The minimum number of replicas that can serve an application is determined by the `minScale` annotation.
 
-The `minScale` value defaults to `0` pods if the following conditions are met:
+The `minScale` value defaults to `0` replicas if the following conditions are met:
 
 * The `minScale` annotation is not set
 * Scaling to zero is enabled
@@ -50,7 +50,7 @@ include::modules/serverless-autoscaling-minscale-kn.adoc[leveloffset=+2]
 [id="serverless-autoscaling-maxscale"]
 == Maximum scale bounds
 
-The maximum number of pods that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of pods created.
+The maximum number of replicas that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of replicas created.
 
 .Example service spec with `maxScale` spec
 [source,yaml]
diff --git a/serverless/autoscaling/serverless-autoscaling.adoc b/serverless/autoscaling/serverless-autoscaling.adoc
@@ -6,7 +6,7 @@ include::modules/serverless-document-attributes.adoc[]
 
 toc::[]
 
-Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale to zero is enabled, Knative Serving scales the application down to zero pods. If scaling to zero is disabled, the application is scaled down to the minimum number of pods specified for applications on the cluster. Pods can also be scaled up to meet demand if traffic to the application increases.
+Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale to zero is enabled, Knative Serving scales the application down to zero replicas. If scaling to zero is disabled, the application is scaled down to the minimum number of replicas specified for applications on the cluster. Replicas can also be scaled up to meet demand if traffic to the application increases.
 
 To enable autoscaling for Knative Serving, you must configure xref:../autoscaling/serverless-autoscaling-concurrency.adoc#serverless-autoscaling-concurrency[concurrency] and xref:../autoscaling/serverless-autoscaling-scale-bounds.adoc#serverless-autoscaling-scale-bounds[scale bounds] for your application.
 

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,7 @@`
	`1`	`+// Module included in the following assemblies:`
	`2`	`+//`
	`3`	`+// * serverless/autoscaling/serverless-autoscaling-concurrency.adoc`
	`4`	`+`
`1`	`5`	`[id="serverless-concurrency-limits-configure-hard_{context}"]`
`2`	`6`	`= Configuring a hard concurrency limit`
`3`	`7`
`@@ -24,9 +28,9 @@ spec:`
`24`	`28`	`containerConcurrency: 50`
`25`	`29`	`----`
`26`	`30`	`+`
`27`		-The default value is `0`, which means that there is no limit on the number of requests that are permitted to flow into one pod of the service at a time.
	`31`	+The default value is `0`, which means that there is no limit on the number of simultaneous requests that are permitted to flow into one replica of the service at a time.
`28`	`32`	`+`
`29`		-A value greater than `0` specifies the exact number of requests that are permitted to flow into one pod of the service at a time. This example would enable a hard concurrency limit of 50 requests at a time.
	`33`	+A value greater than `0` specifies the exact number of requests that are permitted to flow into one replica of the service at a time. This example would enable a hard concurrency limit of 50 requests.
`30`	`34`
`31`	`35`	* Optional: Use the `kn service` command to specify the `--concurrency-limit` flag:
`32`	`36`	`+`