Skip to content

Commit 436bee8

Browse files
committed
Add initial blog post on custom scaling
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <[email protected]>
1 parent e0efb76 commit 436bee8

File tree

4 files changed

+379
-0
lines changed

4 files changed

+379
-0
lines changed
Lines changed: 379 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,379 @@
1+
---
2+
title: How to scale OpenFaaS Functions with Custom Metrics
3+
description: Learn how to use metrics exposed by your Function Pods, or any other Prometheus metric to scale functions.
4+
date: 2024-08-22
5+
categories:
6+
- kubernetes
7+
- faas
8+
- functions
9+
- autoscaling
10+
- metrics
11+
dark_background: true
12+
image: /images/2024-08-scaling/custom/background.png
13+
author_staff_member: alex
14+
hide_header_image: true
15+
---
16+
17+
In the [first part of our mini-series on autoscaling](/blog/what-goes-up-must-come-down/), we looked at how autoscaling has evolved in OpenFaaS going all the way back to 2017, and how today you can use Requests Per Second, Capacity (inflight requests), and CPU utilization to scale functions horizontally.
18+
19+
Today, we are going to show you what you can do with custom Prometheus metrics, for when you need to scale based upon work being done by your function's Pod, or some other metric that you emit to Prometheus like the number of pending requests, latency, or the number of items in a queue or event-stream.
20+
21+
## Why do we need custom metrics?
22+
23+
Customers often ask: what is the right scaling approach for our functions? This presupposes that there is only one ideal, optimal way to scale every kind of workload you'll ever have in your system.
24+
25+
Suppose you target 99.5% latency of requests. If you have a stateless function, which has no external dependencies, then a fair assumption would be that additional replicas would decrease latency during congestion. However if that function is dependent on a database, or a remote API, then adding new replicas may even increase, not decrease latency.
26+
27+
One approach such as RPS or Capacity may yield good results with many types of functions, but you cannot beat observing your functions in production, and tuning them according to your needs - whether that's the type of scaling, the minimum/maximum replica count, or adding a readiness check.
28+
29+
Look out for how latency is affected during peak times, which HTTP status codes you receive, and don't rule out invoking functions asynchronously, to defer and buffer the work.
30+
31+
## Overview - How scaling on custom metrics works
32+
33+
With the current design of the OpenFaaS Autoscaler, you need to do three things:
34+
35+
* Find an existing Prometheus metric, or emit a new one
36+
* Configure the *Recording Rules* in the configuration OpenFaaS sets for Prometheus to emit a new scaling type
37+
* Set that scaling type on your function using the `com.openfaas.scale.type` label, just like with the built-in metrics
38+
39+
### 1. Pick a metric
40+
41+
You have five options for metrics:
42+
43+
1. Use one of the built-in metrics from various OpenFaaS components ranging from the Gateway, to the queue-worker, to the Kafka connector, etc. These are documented here: [Monitoring Functions](https://docs.openfaas.com/architecture/metrics/)
44+
2. Use the CPU or RAM metrics already scraped from each node in the cluster i.e. `pod_cpu_usage_seconds_total` or `pod_memory_working_set_bytes`
45+
3. Use one of the [built-in metrics emitted by the OpenFaaS watchdog](https://docs.openfaas.com/architecture/metrics/#watchdog) - these include things like inflight requests, number of requests and latency
46+
4. Emit a new metric from your function's handler. In this case, you'll import the Prometheus SDK for your language of choice such as Python or Go, and register a metric endpoint, and add a couple of annotations so Prometheus knows to scrape it
47+
5. The final option is to have your own control plane emit a new metric centrally, this is where you may be able to expose a queue depth, the number of pending requests, or work with some kind of internal business metric like a Service Level Objective (SLO) or Key Performance Indicator (KPI)
48+
49+
### 2. Set up a new recording rule in Prometheus
50+
51+
The autoscaler uses a recording rule named `job:function_current_load:sum` to understand the total load for a given function. It then makes a simple calculation where the total sum is divided by the target figure for a function to determine the ideal amount of Pods.
52+
53+
Here is the rule we are using for CPU based scaling:
54+
55+
```yaml
56+
- record: job:function_current_load:sum
57+
expr: |
58+
ceil(sum(irate ( pod_cpu_usage_seconds_total{}[1m])*1000) by (function_name)
59+
*
60+
on (function_name) avg by (function_name) (gateway_service_target_load{scaling_type="cpu"} > bool 1 ))
61+
labels:
62+
scaling_type: cpu
63+
```
64+
65+
The `expr` field shows the Prometheus query that will be evaluated, and the `labels` show the scaling type for which the data will be emitted.
66+
67+
The first half `ceil(sum(irate ( pod_cpu_usage_seconds_total{}[1m])*1000) by (function_name)` is the basic query which resembles what you'll see on `kubectl top pod -n openfaas-fn`.
68+
69+
The second half: `* on (function_name) avg by (function_name) (gateway_service_target_load{scaling_type="cpu"} > bool 1` is an optimisation which means this rule is only evaluated when a function has a label set of `com.openfaas.scaling.type: cpu`.
70+
71+
### 3. Set the scaling type on your function
72+
73+
Here is a redacted example of how a function can target the `cpu` recording rule:
74+
75+
```yaml
76+
functions:
77+
bcrypt:
78+
labels:
79+
com.openfaas.scale.min: "1"
80+
com.openfaas.scale.max: "10"
81+
com.openfaas.scale.target: "500"
82+
com.openfaas.scale.proportion: "0.9"
83+
com.openfaas.scale.type: "cpu"
84+
```
85+
86+
The minimum and maximum number of replicas are configured, followed by the target amount per Pod, in this case it's "500Mi" or half a vCPU. The proportion is used to tune how close to the target the function should be before scaling. Finally, the `com.openfaas.scale.type` is set to `cpu`.
87+
88+
We'll now look at some concrete examples of custom rules.
89+
90+
## Example 1: Scaling on the request latency
91+
92+
In this example we'll scale based upon the average request latency from the `bcrypt` function. The rough target we've seen is `6ms` or `0.06s`, so we'll set that as a target, and if the latency goes up that figure, additional Pods will be added.
93+
94+
Here's what we'll put in our function's configuration, we can omit `com.openfaas.scale.min` and `com.openfaas.scale.max` as they already have defaults.
95+
96+
```yaml
97+
functions:
98+
bcrypt:
99+
labels:
100+
com.openfaas.scale.target: "0.06"
101+
com.openfaas.scale.proportion: "0.9"
102+
com.openfaas.scale.type: "latency"
103+
```
104+
105+
Next let's add a recording rule, and use the latency from a metric emitted by the gateway:
106+
107+
```yaml
108+
- record: job:function_current_load:sum
109+
expr: |
110+
sum by (function_name) (rate(gateway_functions_seconds_sum{}[30s])) / sum by (function_name) (rate( gateway_functions_seconds_count{}[30s]))
111+
and
112+
on (function_name) avg by(function_name) (gateway_service_target_load{scaling_type="latency"}) > bool 1
113+
labels:
114+
scaling_type: latency
115+
```
116+
117+
**Contrasting latency with RPS**
118+
119+
We adapted the [e2e tests for the openfaas autoscaler](https://github.com/openfaas/openfaas-autoscaler-tests) to generate load on the bcrypt function using the ramp test which goes up to 150RPS over 2 minutes then sustains it for a further two minutes.
120+
121+
The results are from a single CPU, single node Kubernetes cluster with Pod vCPU limited to 1000Mi, or around 1x vCPU per Pod. With a maximum of 10 Pods, this is similar to 10 vCPUs generating bcrypt hashes at once.
122+
123+
![Results from latency based scaling with the bcrypt function](/images/2024-08-scaling/custom/latency-bcrypt.png)
124+
125+
The initial latency starts off around the best case for this processor near 0.05-0.15ms per hash, then the thundering herd causes congestion. The system starts to add additional replicas, and the latency stabilises itself for the full test.
126+
127+
We also set up a stable scale down window of 2m30s in order to prevent any variable latency from causing the function to scale down too quickly.
128+
129+
We'd typically suggest something CPU-bound like bcrypt is scaled with the CPU approach, however we tried the same test with RPS, with a target of 10 RPS per pod.
130+
131+
![Results from RPS based scaling with the bcrypt function](/images/2024-08-scaling/custom/rps-bcrypt.png)
132+
133+
When you compare the two "Replicas per function" graphs, you can see that the custom latency based approach is more responsive and gives a more stable result in the "Average duration by status" graph.
134+
135+
## Example 2: Scaling based upon a built-in watchdog metric
136+
137+
The [watchdog itself emits several metrics](https://docs.openfaas.com/architecture/metrics/#watchdog) which can be used for scaling, most of them are already available in aggregate from the gateway, but we wanted to show you this option.
138+
139+
Here's how you can scale based upon the number of inflight requests:
140+
141+
```yaml
142+
sleep:
143+
image: ghcr.io/openfaas/sleep:latest
144+
skip_build: true
145+
environment:
146+
write_timeout: 60s
147+
labels:
148+
com.openfaas.scale.min: 1
149+
com.openfaas.scale.max: 10
150+
com.openfaas.scale.target: 5
151+
com.openfaas.scale.type: inflight
152+
annotations:
153+
prometheus.io.scrape: true
154+
prometheus.io.path: "/metrics"
155+
prometheus.io.port: "8081"
156+
```
157+
158+
In this case we've used the `type` of `inflight`, and added three extra annotations:
159+
160+
1. `prometheus.io.scrape: true` - this tells Prometheus to scrape the metrics from the Pod directly
161+
2. `prometheus.io.path: "/metrics"` - this is the path where the metrics are exposed
162+
3. `prometheus.io.port: "8081"` - this is the port where the metrics are exposed, in this case it's not actually the default of 8080, which is the proxy to access the function, but 8081, a separate HTTP server that only exposes Prometheus metrics.
163+
164+
For the recording rule:
165+
166+
```yaml
167+
- record: job:function_current_load:sum
168+
expr: |
169+
ceil(sum by (function_name) ( max_over_time( http_requests_in_flight[45s:5s]))
170+
and
171+
on (function_name) avg by(function_name) (gateway_service_target_load{scaling_type="inflight"}) > bool 1)
172+
labels:
173+
scaling_type: inflight
174+
```
175+
176+
This rule is very similar to the built-in capacity scaling mode, however the data is coming directly from function Pods instead of being measured at the gateway.
177+
178+
## Example 3: Scaling on a metric emitted by the function
179+
180+
If you include the Prometheus SDK in your function, then you can emit metrics quite simply.
181+
182+
We've written an example for a Go function which scales based upon the number of items it receives in a JSON payload. You could imagine this may be a function connected to AWS SNS, which variable-sized batches are sent depending on congestion.
183+
184+
The below is a fictitious, but realistic example of the payload the function could receive from SNS:
185+
186+
```json
187+
{
188+
"items": [
189+
{
190+
"arn": "arn:aws:sns:us-east-1:123456789012:MyTopic",
191+
"event-type": "order_placed",
192+
}
193+
]
194+
}
195+
```
196+
197+
Then this is how to import the Prometheus SDK, how to register the metric, and how to record the number of items against it for each request:
198+
199+
```go
200+
package function
201+
202+
import (
203+
"encoding/json"
204+
"fmt"
205+
"io"
206+
"net/http"
207+
"time"
208+
209+
"github.com/prometheus/client_golang/prometheus"
210+
"github.com/prometheus/client_golang/prometheus/promauto"
211+
"github.com/prometheus/client_golang/prometheus/promhttp"
212+
)
213+
214+
var (
215+
itemsGauge = promauto.NewGauge(prometheus.GaugeOpts{
216+
Name: "function_batch_items_processing",
217+
Help: "Total batch items currently being processed",
218+
})
219+
)
220+
221+
var meticsHandler http.Handler = promhttp.Handler()
222+
223+
func Handle(w http.ResponseWriter, r *http.Request) {
224+
if r.URL.Path == "/metrics" {
225+
meticsHandler.ServeHTTP(w, r)
226+
return
227+
}
228+
229+
if r.Body != nil {
230+
defer r.Body.Close()
231+
232+
body, _ := io.ReadAll(r.Body)
233+
var payload map[string][]interface{}
234+
err := json.Unmarshal(body, &payload)
235+
if err != nil {
236+
http.Error(w, fmt.Sprintf("failed to unmarshal request body: %s", err), http.StatusBadRequest)
237+
return
238+
}
239+
240+
items := payload["items"]
241+
numItems := len(items)
242+
243+
itemsGauge.Add(float64(numItems))
244+
defer func() {
245+
itemsGauge.Sub(float64(numItems))
246+
}()
247+
248+
// Simulate processing the items batch.
249+
time.Sleep(time.Millisecond * 100 * time.Duration(numItems))
250+
}
251+
252+
w.WriteHeader(http.StatusOK)
253+
}
254+
```
255+
256+
Once again, specify your new scaling approach in the function's configuration:
257+
258+
```yaml
259+
sns-handler:
260+
lang: golang-middleware
261+
handler: ./sns-handler
262+
labels:
263+
com.openfaas.scale.target: 3
264+
com.openfaas.scale.type: batch-items
265+
annotations:
266+
prometheus.io.scrape: true
267+
prometheus.io.path: "/metrics"
268+
```
269+
270+
Then you need a custom recording rule to sum up the total number of items in the metric across all the replicas:
271+
272+
```yaml
273+
- record: job:function_current_load:sum
274+
expr: |
275+
ceil(sum by (function_name) ( max_over_time( function_batch_items_processing[45s:5s]))
276+
and
277+
on (function_name) avg by(function_name) (gateway_service_target_load{scaling_type="batch-items"}) > bool 1)
278+
labels:
279+
scaling_type: batch-items
280+
```
281+
282+
A basic way to invoke the function would be to use `hey` and a static batch size of i.e. 5 items.
283+
284+
```bash
285+
cat > payload.json <<EOF
286+
{
287+
"items": [
288+
{
289+
"arn": "arn:aws:sns:us-east-1:123456789012:MyTopic",
290+
"event-type": "order_placed",
291+
},
292+
{
293+
"arn": "arn:aws:sns:us-east-1:123456789012:MyTopic",
294+
"event-type": "order_placed",
295+
},
296+
{
297+
"arn": "arn:aws:sns:us-east-1:123456789012:MyTopic",
298+
"event-type": "order_placed",
299+
},
300+
{
301+
"arn": "arn:aws:sns:us-east-1:123456789012:MyTopic",
302+
"event-type": "order_placed",
303+
},
304+
{
305+
"arn": "arn:aws:sns:us-east-1:123456789012:MyTopic",
306+
"event-type": "order_placed",
307+
}
308+
]
309+
}
310+
```
311+
312+
Then run the following command to invoke the function with that static payload.
313+
314+
```
315+
hey -d '$(cat payload.json)' -m POST -c 1 -z 60s -q 1 http://127.0.0.1:8080/function/sns-handler
316+
```
317+
318+
In this example, we assume the batches of events come from an external system, AWS SNS. It's likely that we have no control over the batch size, or any way to emit a metric from the SNS service itself, so measuring in the function makes more sense.
319+
320+
If the data is coming from your own database, queue, or control-plane, then you could emit a centralised metric instead, from that single component.
321+
322+
Just bear in mind that if you emit the same data from multiple replicas of that component, you should apply an `avg` function instead of a `sum` in the recording rule, otherwise the total will be multiplied by the number of replicas. This is something we've already factored into the built-in metrics for RPS and Capacity, which are emitted from the gateway, which often has multiple replicas.
323+
324+
## Example 4: Scaling based upon an external metric
325+
326+
We'll show you how you could set up scale based upon RAM for OpenFaaS functions, just like the built-in CPU scaling type.
327+
328+
There is a function we've packaged that uses a set amount of RAM, which can be used to simulate RAM usage.
329+
330+
```yaml
331+
stress-memory:
332+
skip_build: true
333+
image: ghcr.io/welteki/stress:latest
334+
fprocess: "stress --vm 1 --vm-bytes 20M -t 10"
335+
requests:
336+
memory: 20Mi
337+
environment:
338+
read_timeout: "2m"
339+
write_timeout: "2m"
340+
exec_timeout: "2m"
341+
labels:
342+
com.openfaas.scale.min: 1
343+
com.openfaas.scale.max: 10
344+
com.openfaas.scale.target: 20000000
345+
com.openfaas.scale.type: memory
346+
com.openfaas.scale.target-proportion: 0.8
347+
```
348+
349+
Let's look at the labels:
350+
351+
* `com.openfaas.scale.target` - this is the target amount of memory in bytes i.e. 20MB
352+
* `com.openfaas.scale.target-proportion` - this is the proportion of the target that the function should be at before scaling, in this case 80%
353+
* `com.openfaas.scale.type` - this is set to `memory`
354+
355+
Now let's write the recording rule:
356+
357+
```yaml
358+
- record: job:function_current_load:sum
359+
expr: |
360+
ceil(sum by (function_name) (max_over_time(pod_memory_working_set_bytes[45s:5s]))
361+
*
362+
on (function_name) avg by (function_name) (gateway_service_target_load{scaling_type="memory"} > bool 1))
363+
labels:
364+
scaling_type: memory
365+
```
366+
367+
This rule is very similar to the CPU based rule, but it uses the `pod_memory_working_set_bytes` metric instead which is already scraped by Prometheus from each Node in the cluster.
368+
369+
## In-conclusion
370+
371+
Over this two-part series, we started off with a recap of the autoscaling journey, how it's evolved to meet new customer needs, and then went on to show practical examples on how to extend it to meet your own bespoke needs.
372+
373+
We expect that most customers will get the best results from the built-in scaling modes such as: RPS, Capacity and CPU. However if your workload or its invocation patterns are unique, then the custom metrics are a powerful option for fine-tuning function scaling.
374+
375+
In part one we also introduced the stable window for scaling down, which can be combined with the built-in or custom scaling types to slow down the responsiveness of the autoscaler.
376+
377+
OpenFaaS customers can reach out to us at any time via email for help and advice on autoscaling and general configuration of the platform. We also have a [weekly call](https://docs.openfaas.com/community/) that is free to attend, where you'll get to speak to other customers, share experiences, suggest improvements, and ask questions.
378+
379+
See also: [On Autoscaling - What Goes Up Must Come Down](/blog/what-goes-up-must-come-down/)
89 KB
Loading
97.7 KB
Loading
101 KB
Loading

0 commit comments

Comments
 (0)