You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/architecture/autoscaling.md
+22-7Lines changed: 22 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -136,7 +136,13 @@ mean per pod = 90 / 1 = 90
136
136
137
137
* Queue-depth `queue`
138
138
139
-
Based upon the number of async invocations that are queued for a function. This allows you to scale functions rapidly and proactively to the desired number of replicas to process the queue as quickly as possible. Ideal for functions that are only invoked asynchronously.
139
+
Based upon the number of async invocations that are queued for a function. This allows you to scale functions rapidly and proactively to the desired number of replicas to process the queue as quickly as possible. Ideal for functions that are only invoked asynchronously. To use this mode, your [queue-worker](/openfaas-pro/jetstream) must be configured to scale consumers dynamically through the `function` mode.
140
+
141
+
* Custom metrics i.e. RAM, latency, application metrics, etc
142
+
143
+
Functions can be scaled upon any custom metrics that are available in Prometheus, and which expose a "function_name label in the format of "name.namespace".
144
+
145
+
This could include RAM usage, latency, business/application metrics, etc. Learn more: [Custom autoscaling rules](#custom-autoscaling-rules)
140
146
141
147
* Scaling to zero
142
148
@@ -266,26 +272,35 @@ Note that `com.openfaas.scale.zero=false` is a default, so this is not strictly
266
272
267
273
**4) Queue-depth based scaling**
268
274
269
-
When the number of incoming async invocation increases, the queue depth grows. By scaling functions based on this metric, you can proactively add more replicas to process messages faster.
275
+
Scaling based upon the queue depth for a function is a perfect match for asynchronous invocations.
276
+
277
+
Rather than measuring load upon the function as the other strategies do, the queue depth can be measured, and the number of target replicas can be set immediately.
278
+
279
+
This example limits concurrent requests to 1 for a long running sleep function.
270
280
271
281
```bash
272
282
faas-cli store deploy sleep \
273
283
--label com.openfaas.scale.max=10 \
274
-
--label com.openfaas.scale.target=10 \
284
+
--label com.openfaas.scale.target=1 \
275
285
--label com.openfaas.scale.type=queue \
276
286
--label com.openfaas.scale.target-proportion=1 \
287
+
--label com.openfaas.scale.zero=true \
277
288
--env max_inflight=1
278
289
279
290
hey -m POST -n 30 -c 30 \
280
291
http://127.0.0.1:8080/async-function/sleep
281
292
```
282
293
283
-
This sleep function takes 2 seconds to complete, and has a *hard limit* on the number of invocations of 1 concurrent request.
294
+
The sleep function we've deployed has a hard limit that means it will only process 1 concurrent request at a time because of the `max_inflight` environment variable.
284
295
285
-
With the above scaling configuration, if 30 messages are submitted to the queue via async invocations, the sleep function will scale to 3 replicas immediately.
296
+
When 30 invocations are queued, the scaling parameters will mean that 30 replicas will be required to process the backlog, however the upper limit is 10 replicas. So it will scale to 10 replicas, and process up to queued requests in parallel.
297
+
298
+
The `com.openfaas.scale.zero=true` label is set to ensure that the function scales to zero when the queue is empty.
286
299
287
300
## Smoothing out scaling down with a stable window
288
301
302
+
If traffic to a function oscillates, the autoscaler will attempt to match that load and the number of replicas will also oscillate and mirror the load. This can be smoothed out through a stable window.
303
+
289
304
The `com.openfaas.scale.down.window` label can be set with a Go duration up to a maximum of `5m` or `300s`. When set, the autoscaler will record recommendations on each cycle, and only scale down a function to the highest recorded recommendation of replicas.
290
305
291
306

@@ -305,9 +320,9 @@ Scaling up, and scale to zero are unaffected, by default this setting is turned
305
320
306
321
## Custom autoscaling rules
307
322
308
-
In addition to the built-in scaling types, custom Prometheus expressions can be used to scale functions. For instance you may want to scale based upon queue-depth, Kafka consumer lag, latency, RAM used by a function, or a custom business metric exposed by your function's handler.
323
+
In addition to the built-in scaling types, custom Prometheus expressions can be used to scale functions. For instance you may want to scale based upon queue-depth, Kafka consumer lag, latency, RAM used by a function (example in linked blog post), or a custom business metric exposed by your function's handler.
309
324
310
-
You can learn more in the blog post: [How to scale OpenFaaS Functions with Custom Metrics](https://www.openfaas.com/blog/custom-metrics-scaling/).
325
+
Blog post / walk-through: [How to scale OpenFaaS Functions with Custom Metrics](https://www.openfaas.com/blog/custom-metrics-scaling/).
311
326
312
327
For example, to add latency-based scaling using the gateway's gateway_functions_seconds histogram, you could add the following to the openfaas chart in values-pro.yaml:
0 commit comments