You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Initial additions for minScale and maxScale annotations (#1468)
* Initial additions for minScale and maxScale annotations
* minor corrections from feedback and updated information about configuring HPA
* minor updates
* updated link to blog
Copy file name to clipboardExpand all lines: docs/serving/configuring-the-autoscaler.md
+51-18Lines changed: 51 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,16 +8,15 @@ Since Knative v0.2, per revision autoscalers have been replaced by a single
8
8
shared autoscaler. This is, by default, the Knative Pod Autoscaler (KPA), which
9
9
provides fast, request-based autoscaling capabilities out of the box.
10
10
11
-
##Configuring Knative Pod Autoscaler
11
+
# Configuring Knative Pod Autoscaler
12
12
13
-
To modify the autoscaler configuration, you must modify a Kubernetes ConfigMap
14
-
called `config-autoscaler` in the `knative-serving` namespace.
13
+
To modify the Knative Pod Autoscaler (KPA) configuration, you must modify a Kubernetes ConfigMap called `config-autoscaler` in the `knative-serving` namespace.
15
14
16
15
You can view the default contents of this ConfigMap using the following command.
17
16
18
17
`kubectl -n knative-serving get cm config-autoscaler`
19
18
20
-
###Example of default ConfigMap
19
+
## Example of default ConfigMap
21
20
22
21
```
23
22
apiVersion: v1
@@ -37,12 +36,12 @@ data:
37
36
tick-interval: 2s
38
37
```
39
38
40
-
##Configuring scale to zero
39
+
# Configuring scale to zero for KPA
41
40
42
41
To correctly configure autoscaling to zero for revisions, you must modify the
43
42
following parameters in the ConfigMap.
44
43
45
-
###scale-to-zero-grace-period
44
+
## scale-to-zero-grace-period
46
45
47
46
`scale-to-zero-grace-period` specifies the time an inactive revision is left
48
47
running before it is scaled to zero (min: 30s).
@@ -51,7 +50,7 @@ running before it is scaled to zero (min: 30s).
51
50
scale-to-zero-grace-period: 30s
52
51
```
53
52
54
-
###stable-window
53
+
## stable-window
55
54
56
55
When operating in a stable mode, the autoscaler operates on the average
57
56
concurrency over the stable window.
@@ -67,11 +66,11 @@ annotation.
67
66
autoscaling.knative.dev/window: 60s
68
67
```
69
68
70
-
###enable-scale-to-zero
69
+
## enable-scale-to-zero
71
70
72
71
Ensure that enable-scale-to-zero is set to `true`.
73
72
74
-
###Termination period
73
+
## Termination period
75
74
76
75
The termination period is the time that the pod takes to shut down after the
77
76
last request is finished. The termination period of the pod is equal to the sum
@@ -82,6 +81,8 @@ parameters. In the case of this example, the termination period would be 90s.
82
81
83
82
Concurrency for autoscaling can be configured using the following methods.
84
83
84
+
## Configuring concurrent request limits
85
+
85
86
### target
86
87
87
88
`target` defines how many concurrent requests are wanted at a given time (soft
@@ -94,7 +95,7 @@ The default value for concurrency target is specified in the ConfigMap as `100`.
94
95
```
95
96
96
97
This value can be configured by adding or modifying the
97
-
`autoscaling.knative.dev/target` annotation value in the Revision template.
98
+
`autoscaling.knative.dev/target` annotation value in the revision template.
98
99
99
100
```
100
101
autoscaling.knative.dev/target: 50
@@ -108,38 +109,70 @@ limit how many requests reach the app at a given time. Using
108
109
enforced constraint of concurrency.
109
110
110
111
`containerConcurrency` limits the amount of concurrent requests are allowed into
111
-
the application at a given time (hard limit), and is configured in the Revision
112
+
the application at a given time (hard limit), and is configured in the revision
112
113
template.
113
114
114
115
```
115
116
containerConcurrency: 0 | 1 | 2-N
116
117
```
117
118
118
119
- A `containerConcurrency` value of `1` will guarantee that only one request is
119
-
handled at a time by a given instance of the Revision container.
120
+
handled at a time by a given instance of the revision container.
120
121
- A value of `2` or more will limit request concurrency to that value.
121
122
- A value of `0` means the system should decide.
122
123
123
124
If there is no `/target` annotation, the autoscaler is configured as if
124
125
`/target` == `containerConcurrency`.
125
126
127
+
## Configuring scale bounds (minScale and maxScale)
128
+
129
+
The `minScale` and `maxScale` annotations can be used to configure the minimum and maximum number of pods that can serve applications.
130
+
These annotations can be used to prevent cold starts or to help control computing costs.
131
+
132
+
`minScale` and `maxScale` can be configured as follows in the revision template;
133
+
134
+
```
135
+
spec:
136
+
template:
137
+
metadata:
138
+
autoscaling.knative.dev/minScale: "2"
139
+
autoscaling.knative.dev/maxScale: "10"
140
+
```
141
+
142
+
Using these annotations in the revision template will propagate this to `PodAutoscaler` objects. `PodAutoscaler` objects are mutable and can be further modified later without modifying anything else in the Knative Serving system.
143
+
144
+
```
145
+
edit podautoscaler <revision-name>
146
+
```
147
+
148
+
**NOTE:** These annotations apply for the full lifetime of a revision. Even when a revision is not referenced by any route, the minimal pod count specified by `minScale` will still be provided. Keep in mind that non-routeable revisions may be garbage collected, which enables Knative to reclaim the resources.
149
+
150
+
### Default behavior
151
+
152
+
If the `minScale` annotation is not set, pods will scale to zero (or to 1 if `enable-scale-to-zero` is `false` per the ConfigMap mentioned above).
153
+
154
+
If the `maxScale` annotation is not set, there will be no upper limit for the number of pods created.
155
+
126
156
## Configuring CPU-based autoscaling
127
157
128
158
**NOTE:** You can configure Knative autoscaling to work with either the default
129
-
KPA or a CPU based metric, i.e. Horizontal Pod Autoscaler (HPA), however
130
-
scale-to-zero capabilities are only supported for KPA.
159
+
KPA or a CPU based metric, i.e. Horizontal Pod Autoscaler (HPA).
131
160
132
161
You can configure Knative to use CPU based autoscaling instead of the default
133
162
request based metric by adding or modifying the `autoscaling.knative.dev/class`
134
-
and `autoscaling.knative.dev/metric` values as annotations in the Revision
163
+
and `autoscaling.knative.dev/metric` values as annotations in the revision
0 commit comments