Skip to content

Commit 053ccd5

Browse files
authored
Merge pull request #38800 from abrennan89/targetutil
SRVKS-573: Add target utilization docs
2 parents c312ffd + cfb9b51 commit 053ccd5

9 files changed

+56
-12
lines changed

modules/serverless-autoscaling-maxscale-kn.adoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1+
// Module is included in the following assemblies:
2+
//
3+
// * serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc
4+
15
[id="serverless-autoscaling-maxscale-kn_{context}"]
26
= Setting the maxScale annotation by using the Knative CLI
37

48
You can use the `kn service` command with the `--max-scale` flag to create or modify the `--max-scale` value for a service.
59

610
.Procedure
711

8-
* Set the maximum number of pods for the service by using the `--max-scale` flag:
12+
* Set the maximum number of replicas for the service by using the `--max-scale` flag:
913
+
1014
[source,terminal]
1115
----

modules/serverless-autoscaling-minscale-kn.adoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1+
// Module is included in the following assemblies:
2+
//
3+
// * serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc
4+
15
[id="serverless-autoscaling-minscale_{context}"]
26
= Setting the minScale annotation by using the Knative CLI
37

48
You can use the `kn service` command with the `--min-scale` flag to create or modify the `--min-scale` value for a service.
59

610
.Procedure
711

8-
* Set the maximum number of pods for the service by using the `--min-scale` flag:
12+
* Set the minimum number of replicas for the service by using the `--min-scale` flag:
913
+
1014
.Examples
1115
[source,terminal]

modules/serverless-concurrency-limits-configure-hard.adoc

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * serverless/autoscaling/serverless-autoscaling-concurrency.adoc
4+
15
[id="serverless-concurrency-limits-configure-hard_{context}"]
26
= Configuring a hard concurrency limit
37

@@ -24,9 +28,9 @@ spec:
2428
containerConcurrency: 50
2529
----
2630
+
27-
The default value is `0`, which means that there is no limit on the number of requests that are permitted to flow into one pod of the service at a time.
31+
The default value is `0`, which means that there is no limit on the number of simultaneous requests that are permitted to flow into one replica of the service at a time.
2832
+
29-
A value greater than `0` specifies the exact number of requests that are permitted to flow into one pod of the service at a time. This example would enable a hard concurrency limit of 50 requests at a time.
33+
A value greater than `0` specifies the exact number of requests that are permitted to flow into one replica of the service at a time. This example would enable a hard concurrency limit of 50 requests.
3034

3135
* Optional: Use the `kn service` command to specify the `--concurrency-limit` flag:
3236
+

modules/serverless-concurrency-limits-configure-soft.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * serverless/autoscaling/serverless-autoscaling-concurrency.adoc
4+
15
[id="serverless-concurrency-limits-configure-soft_{context}"]
26
= Configuring a soft concurrency target
37

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * /serverless/autoscaling/serverless-autoscaling-concurrency.adoc
4+
5+
[id="serverless-target-utilization_{context}"]
6+
= Concurrency target utilization
7+
8+
This value specifies the percentage of the concurrency limit that is actually targeted by the autoscaler. This is also known as specifying the _hotness_ at which a replica runs, which enables the autoscaler to scale up before the defined hard limit is reached.
9+
10+
For example, if the `containerConcurrency` annotation value is set to 10, and the `targetUtilizationPercentage` value is set to 70 percent, the autoscaler creates a new replica when the average number of concurrent requests across all existing replicas reaches 7. Requests numbered 7 to 10 are still sent to the existing replicas, but additional replicas are started in anticipation of being required after the `containerConcurrency` annotation limit is reached.
11+
12+
.Example service configured using the targetUtilizationPercentage annotation
13+
[source,yaml]
14+
----
15+
apiVersion: serving.knative.dev/v1
16+
kind: Service
17+
metadata:
18+
name: example-service
19+
namespace: default
20+
spec:
21+
template:
22+
metadata:
23+
annotations:
24+
autoscaling.knative.dev/targetUtilizationPercentage: "70"
25+
...
26+
----

serverless/autoscaling/modules

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../modules

serverless/autoscaling/serverless-autoscaling-concurrency.adoc

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@ include::modules/serverless-document-attributes.adoc[]
66

77
toc::[]
88

9-
Concurrency determines the number of simultaneous requests that can be processed by each pod of an application at any given time.
9+
Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.
1010

1111
include::modules/serverless-concurrency-limits.adoc[leveloffset=+1]
12-
include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+2]
13-
include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+2]
12+
include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+1]
13+
include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+1]
14+
include::modules/serverless-target-utilization.adoc[leveloffset=+1]

serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,16 @@ include::modules/serverless-document-attributes.adoc[]
66

77
toc::[]
88

9-
Scale bounds determine the minimum and maximum numbers of pods that can serve an application at any given time.
9+
Scale bounds determine the minimum and maximum numbers of replicas that can serve an application at any given time.
1010

1111
You can set scale bounds for an application to help prevent cold starts or control computing costs.
1212

1313
[id="serverless-autoscaling-minscale"]
1414
== Minimum scale bounds
1515

16-
The minimum number of pods that can serve an application is determined by the `minScale` annotation.
16+
The minimum number of replicas that can serve an application is determined by the `minScale` annotation.
1717

18-
The `minScale` value defaults to `0` pods if the following conditions are met:
18+
The `minScale` value defaults to `0` replicas if the following conditions are met:
1919

2020
* The `minScale` annotation is not set
2121
* Scaling to zero is enabled
@@ -50,7 +50,7 @@ include::modules/serverless-autoscaling-minscale-kn.adoc[leveloffset=+2]
5050
[id="serverless-autoscaling-maxscale"]
5151
== Maximum scale bounds
5252

53-
The maximum number of pods that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of pods created.
53+
The maximum number of replicas that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of replicas created.
5454

5555
.Example service spec with `maxScale` spec
5656
[source,yaml]

serverless/autoscaling/serverless-autoscaling.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ include::modules/serverless-document-attributes.adoc[]
66

77
toc::[]
88

9-
Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale to zero is enabled, Knative Serving scales the application down to zero pods. If scaling to zero is disabled, the application is scaled down to the minimum number of pods specified for applications on the cluster. Pods can also be scaled up to meet demand if traffic to the application increases.
9+
Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale to zero is enabled, Knative Serving scales the application down to zero replicas. If scaling to zero is disabled, the application is scaled down to the minimum number of replicas specified for applications on the cluster. Replicas can also be scaled up to meet demand if traffic to the application increases.
1010

1111
To enable autoscaling for Knative Serving, you must configure xref:../autoscaling/serverless-autoscaling-concurrency.adoc#serverless-autoscaling-concurrency[concurrency] and xref:../autoscaling/serverless-autoscaling-scale-bounds.adoc#serverless-autoscaling-scale-bounds[scale bounds] for your application.
1212

0 commit comments

Comments
 (0)