Skip to content

Commit 45f7861

Browse files
abrennan89knative-prow-robot
authored andcommitted
Initial additions for minScale and maxScale annotations (#1468)
* Initial additions for minScale and maxScale annotations * minor corrections from feedback and updated information about configuring HPA * minor updates * updated link to blog
1 parent ccfa98c commit 45f7861

File tree

1 file changed

+51
-18
lines changed

1 file changed

+51
-18
lines changed

docs/serving/configuring-the-autoscaler.md

Lines changed: 51 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,15 @@ Since Knative v0.2, per revision autoscalers have been replaced by a single
88
shared autoscaler. This is, by default, the Knative Pod Autoscaler (KPA), which
99
provides fast, request-based autoscaling capabilities out of the box.
1010

11-
## Configuring Knative Pod Autoscaler
11+
# Configuring Knative Pod Autoscaler
1212

13-
To modify the autoscaler configuration, you must modify a Kubernetes ConfigMap
14-
called `config-autoscaler` in the `knative-serving` namespace.
13+
To modify the Knative Pod Autoscaler (KPA) configuration, you must modify a Kubernetes ConfigMap called `config-autoscaler` in the `knative-serving` namespace.
1514

1615
You can view the default contents of this ConfigMap using the following command.
1716

1817
`kubectl -n knative-serving get cm config-autoscaler`
1918

20-
### Example of default ConfigMap
19+
## Example of default ConfigMap
2120

2221
```
2322
apiVersion: v1
@@ -37,12 +36,12 @@ data:
3736
tick-interval: 2s
3837
```
3938

40-
## Configuring scale to zero
39+
# Configuring scale to zero for KPA
4140

4241
To correctly configure autoscaling to zero for revisions, you must modify the
4342
following parameters in the ConfigMap.
4443

45-
### scale-to-zero-grace-period
44+
## scale-to-zero-grace-period
4645

4746
`scale-to-zero-grace-period` specifies the time an inactive revision is left
4847
running before it is scaled to zero (min: 30s).
@@ -51,7 +50,7 @@ running before it is scaled to zero (min: 30s).
5150
scale-to-zero-grace-period: 30s
5251
```
5352

54-
### stable-window
53+
## stable-window
5554

5655
When operating in a stable mode, the autoscaler operates on the average
5756
concurrency over the stable window.
@@ -67,11 +66,11 @@ annotation.
6766
autoscaling.knative.dev/window: 60s
6867
```
6968

70-
### enable-scale-to-zero
69+
## enable-scale-to-zero
7170

7271
Ensure that enable-scale-to-zero is set to `true`.
7372

74-
### Termination period
73+
## Termination period
7574

7675
The termination period is the time that the pod takes to shut down after the
7776
last request is finished. The termination period of the pod is equal to the sum
@@ -82,6 +81,8 @@ parameters. In the case of this example, the termination period would be 90s.
8281

8382
Concurrency for autoscaling can be configured using the following methods.
8483

84+
## Configuring concurrent request limits
85+
8586
### target
8687

8788
`target` defines how many concurrent requests are wanted at a given time (soft
@@ -94,7 +95,7 @@ The default value for concurrency target is specified in the ConfigMap as `100`.
9495
```
9596

9697
This value can be configured by adding or modifying the
97-
`autoscaling.knative.dev/target` annotation value in the Revision template.
98+
`autoscaling.knative.dev/target` annotation value in the revision template.
9899

99100
```
100101
autoscaling.knative.dev/target: 50
@@ -108,38 +109,70 @@ limit how many requests reach the app at a given time. Using
108109
enforced constraint of concurrency.
109110

110111
`containerConcurrency` limits the amount of concurrent requests are allowed into
111-
the application at a given time (hard limit), and is configured in the Revision
112+
the application at a given time (hard limit), and is configured in the revision
112113
template.
113114

114115
```
115116
containerConcurrency: 0 | 1 | 2-N
116117
```
117118

118119
- A `containerConcurrency` value of `1` will guarantee that only one request is
119-
handled at a time by a given instance of the Revision container.
120+
handled at a time by a given instance of the revision container.
120121
- A value of `2` or more will limit request concurrency to that value.
121122
- A value of `0` means the system should decide.
122123

123124
If there is no `/target` annotation, the autoscaler is configured as if
124125
`/target` == `containerConcurrency`.
125126

127+
## Configuring scale bounds (minScale and maxScale)
128+
129+
The `minScale` and `maxScale` annotations can be used to configure the minimum and maximum number of pods that can serve applications.
130+
These annotations can be used to prevent cold starts or to help control computing costs.
131+
132+
`minScale` and `maxScale` can be configured as follows in the revision template;
133+
134+
```
135+
spec:
136+
template:
137+
metadata:
138+
autoscaling.knative.dev/minScale: "2"
139+
autoscaling.knative.dev/maxScale: "10"
140+
```
141+
142+
Using these annotations in the revision template will propagate this to `PodAutoscaler` objects. `PodAutoscaler` objects are mutable and can be further modified later without modifying anything else in the Knative Serving system.
143+
144+
```
145+
edit podautoscaler <revision-name>
146+
```
147+
148+
**NOTE:** These annotations apply for the full lifetime of a revision. Even when a revision is not referenced by any route, the minimal pod count specified by `minScale` will still be provided. Keep in mind that non-routeable revisions may be garbage collected, which enables Knative to reclaim the resources.
149+
150+
### Default behavior
151+
152+
If the `minScale` annotation is not set, pods will scale to zero (or to 1 if `enable-scale-to-zero` is `false` per the ConfigMap mentioned above).
153+
154+
If the `maxScale` annotation is not set, there will be no upper limit for the number of pods created.
155+
126156
## Configuring CPU-based autoscaling
127157

128158
**NOTE:** You can configure Knative autoscaling to work with either the default
129-
KPA or a CPU based metric, i.e. Horizontal Pod Autoscaler (HPA), however
130-
scale-to-zero capabilities are only supported for KPA.
159+
KPA or a CPU based metric, i.e. Horizontal Pod Autoscaler (HPA).
131160

132161
You can configure Knative to use CPU based autoscaling instead of the default
133162
request based metric by adding or modifying the `autoscaling.knative.dev/class`
134-
and `autoscaling.knative.dev/metric` values as annotations in the Revision
163+
and `autoscaling.knative.dev/metric` values as annotations in the revision
135164
template.
136165

137166
```
138-
autoscaling.knative.dev/metric: cpu
139-
autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
167+
spec:
168+
template:
169+
metadata:
170+
autoscaling.knative.dev/metric: concurrency
171+
autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
140172
```
141173

142174
## Additional resources
143175

144176
- [Go autoscaling sample](https://knative.dev/docs/serving/samples/autoscale-go/index.html)
145-
- [Knative v0.3 Autoscaling  - A Love Story blog post](https://medium.com/knative/knative-v0-3-autoscaling-a-love-story-d6954279a67a)
177+
- ["Knative v0.3 Autoscaling  - A Love Story" blog post](https://knative.dev/blog/2019/03/27/knative-v0.3-autoscaling-a-love-story/)
178+
- [Kubernetes Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)

0 commit comments

Comments
 (0)