|
| 1 | +[id="serverless-autoscaling-developer"] |
| 2 | += Autoscaling |
| 3 | +include::modules/common-attributes.adoc[] |
| 4 | +include::modules/serverless-document-attributes.adoc[] |
| 5 | +:context: serverless-autoscaling-developer |
| 6 | + |
| 7 | +toc::[] |
| 8 | + |
| 9 | +Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale-to-zero is enabled, Knative Serving scales the application down to zero replicas. If scale-to-zero is disabled, the application is scaled down to the xref:../../serverless/develop/serverless-autoscaling-developer.adoc#serverless-autoscaling-developer-minscale[minimum number of replicas specified for applications on the cluster]. Replicas can also be scaled up to meet demand if traffic to the application increases. |
| 10 | + |
| 11 | +If Knative autoscaling is enabled for your cluster, you can configure concurrency and scale bounds for your application. |
| 12 | + |
| 13 | +[NOTE] |
| 14 | +==== |
| 15 | +Any limits or targets set in the revision template are measured against a single instance of your application. For example, setting the `target` annotation to `50` configures the autoscaler to scale the application so that each revision handles 50 requests at a time. |
| 16 | +==== |
| 17 | + |
| 18 | +[id="serverless-autoscaling-developer-scale-bounds"] |
| 19 | +== Scale bounds |
| 20 | + |
| 21 | +Scale bounds determine the minimum and maximum numbers of replicas that can serve an application at any given time. |
| 22 | + |
| 23 | +You can set scale bounds for an application to help prevent cold starts or control computing costs. |
| 24 | + |
| 25 | +[id="serverless-autoscaling-developer-minscale"] |
| 26 | +=== Minimum scale bounds |
| 27 | + |
| 28 | +The minimum number of replicas that can serve an application is determined by the `minScale` annotation. |
| 29 | + |
| 30 | +The `minScale` value defaults to `0` replicas if the following conditions are met: |
| 31 | + |
| 32 | +* The `minScale` annotation is not set |
| 33 | +* Scaling to zero is enabled |
| 34 | +* The class `KPA` is used |
| 35 | + |
| 36 | +If scale to zero is not enabled, the `minScale` value defaults to `1`. |
| 37 | + |
| 38 | +// TODO: Document KPA if supported, link to docs about setting class |
| 39 | + |
| 40 | +// TO DO: |
| 41 | +// Add info / links about enabling and disabling autoscaling (admin docs) |
| 42 | +// if `enable-scale-to-zero` is set to `false` in the `config-autoscaler` config map. |
| 43 | + |
| 44 | +.Example service spec with `minScale` spec |
| 45 | +[source,yaml] |
| 46 | +---- |
| 47 | +apiVersion: serving.knative.dev/v1 |
| 48 | +kind: Service |
| 49 | +metadata: |
| 50 | + name: example-service |
| 51 | + namespace: default |
| 52 | +spec: |
| 53 | + template: |
| 54 | + metadata: |
| 55 | + annotations: |
| 56 | + autoscaling.knative.dev/minScale: "0" |
| 57 | +... |
| 58 | +---- |
| 59 | + |
| 60 | +include::modules/serverless-autoscaling-minscale-kn.adoc[leveloffset=+3] |
| 61 | + |
| 62 | +[id="serverless-autoscaling-developer-maxscale"] |
| 63 | +=== Maximum scale bounds |
| 64 | + |
| 65 | +The maximum number of replicas that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of replicas created. |
| 66 | + |
| 67 | +.Example service spec with `maxScale` spec |
| 68 | +[source,yaml] |
| 69 | +---- |
| 70 | +apiVersion: serving.knative.dev/v1 |
| 71 | +kind: Service |
| 72 | +metadata: |
| 73 | + name: example-service |
| 74 | + namespace: default |
| 75 | +spec: |
| 76 | + template: |
| 77 | + metadata: |
| 78 | + annotations: |
| 79 | + autoscaling.knative.dev/maxScale: "10" |
| 80 | +... |
| 81 | +---- |
| 82 | + |
| 83 | +include::modules/serverless-autoscaling-maxscale-kn.adoc[leveloffset=+3] |
| 84 | + |
| 85 | +[id="serverless-autoscaling-developer-concurrency"] |
| 86 | +== Concurrency |
| 87 | + |
| 88 | +Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time. |
| 89 | + |
| 90 | +include::modules/serverless-concurrency-limits.adoc[leveloffset=+2] |
| 91 | +include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+2] |
| 92 | +include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+2] |
| 93 | +include::modules/serverless-target-utilization.adoc[leveloffset=+2] |
| 94 | + |
| 95 | +[id="additional-resources_serverless-autoscaling-developer"] |
| 96 | +== Additional resources |
| 97 | + |
| 98 | +* Scale-to-zero can be enabled or disabled for the cluster by cluster administrators. For more information, see xref:../../serverless/admin_guide/serverless-admin-autoscaling.adoc#serverless-enable-scale-to-zero_serverless-admin-autoscaling[Enabling scale-to-zero]. |
0 commit comments