Skip to content

Commit 28f2865

Browse files
committed
Add inital auto-scaling guide & troubleshooting
Auto-scaling needs padding out for the HPA section by Stefan. Signed-off-by: Alex Ellis <[email protected]>
1 parent a4306d4 commit 28f2865

File tree

5 files changed

+315
-3
lines changed

5 files changed

+315
-3
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@ This page is published through the use of `mkdocs` and is hosted on https://netl
1616

1717
* https://docs.openfaas.com/
1818

19-
All commits into master (or merged PRs) will emerge on the front-page after being rebuilt.
19+
All commits into master (or merged PRs) will appear on the front-page after being rebuilt.

docs/architecture/autoscaling.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Auto-scaling
2+
3+
Auto-scaling in OpenFaaS allows a function to scale up or down depending on demand represented by different metrics.
4+
5+
## Scaling by requests per second
6+
7+
OpenFaaS ships with a single auto-scaling rule defined in the configuration for AlertManager. AlertManager reads usage (requests per second) metrics from Prometheus in order to know when to fire an alert to the API Gateway.
8+
9+
The API Gateway handles AlertManager alerts through its `/system/alert` route.
10+
11+
The auto-scaling provided by this method can be disabled by either deleting the AlertManager deployment or by scaling the deployment to zero replicas.
12+
13+
The AlertManager rules ([alert.rules](https://github.com/openfaas/faas/blob/master/prometheus/alert.rules.yml)) for Swarm can be viewed here and altered as a configuration map.
14+
15+
### Min/max replicas
16+
17+
The minimum (initial) and maximum replica count can be set at deployment time by adding a label to the function.
18+
19+
* `com.openfaas.scale.min`
20+
21+
By default this is set to `1`
22+
23+
* `com.openfaas.scale.max`
24+
25+
The current default value is `20` for 20 replicas
26+
27+
For each alert fired the auto-scaler will add 5 replicas, but we are currently working on making the step configurable as a pre-defined step or proportional percentage. Once an alert is resolved due to a lower load than is needed for scaling the replica count will be scaled to the minimum replica count.
28+
29+
> Note: Active alerts can be viewed in the "Alerts" tab of Prometheus which is deployed with OpenFaaS.
30+
31+
## Scaling by CPU and/or memory utilization
32+
33+
When using Kubernetes the built-in [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) can be used instead of AlertManager.
34+
35+
Find out more in Stefan Prodan's blog post below:
36+
37+
https://stefanprodan.com/2018/kubernetes-scaleway-baremetal-arm-terraform-installer/#horizontal-pod-autoscaling
38+

docs/deployment/kubernetes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -196,4 +196,4 @@ Learn how to build serverless functions with OpenFaaS and Python in our half-day
196196
197197
If you are running into any issues please check out the troubleshooting guide and search the documentation / past issues before raising an issue.
198198
199-
* [Troubleshooting guide](https://github.com/openfaas/faas/blob/master/guide/troubleshooting.md)
199+
* [Troubleshooting guide](https://github.com/openfaas/faas/blob/master/guide/troubleshooting.md)

docs/deployment/troubleshooting.md

Lines changed: 272 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,272 @@
1+
# Troubleshooting guide
2+
3+
## Timeouts
4+
5+
Default timeouts are configured at the HTTP level and must be set both on the gateway and the function.
6+
7+
> Note: all distributed systems need a maximum timeout value to be configured for work. This means that work cannot be unbounded.
8+
9+
### Timeouts - Your function
10+
11+
You can also enforce a hard-timeout for your function with the `hard_timeout` environmental variable.
12+
13+
For watchdog configuration see the [README](https://github.com/openfaas/faas/tree/master/watchdog).
14+
15+
The best way to set the timeout is in the YAML file generated by the `faas-cli`.
16+
17+
Example Go app that sleeps for (10 seconds):
18+
19+
```
20+
provider:
21+
name: faas
22+
gateway: http://127.0.0.1:8080
23+
24+
functions:
25+
sleepygo:
26+
lang: go
27+
handler: ./sleepygo
28+
image: alexellis2/sleeps-for-10-seconds
29+
environment:
30+
read_timeout: 20s
31+
write_timeout: 20s
32+
```
33+
34+
handler.go
35+
36+
```
37+
package function
38+
39+
...
40+
41+
func Handle(req []byte) string {
42+
time.Sleep(time.Second * 10)
43+
return fmt.Sprintf("Hello, Go. You said: %s", string(req))
44+
}
45+
```
46+
47+
### Timeouts - Gateway
48+
49+
For the gateway set the following environmental variables:
50+
51+
```
52+
read_timeout: "25s" # Maximum time to read HTTP request
53+
write_timeout: "25s" # Maximum time to write HTTP response
54+
upstream_timeout: "20s" # Maximum duration of upstream function call
55+
```
56+
57+
> Note: The value for `upstream_timeout` should be slightly less than `read_timeout` and `write_timeout`
58+
59+
### Timeouts - Function provider
60+
61+
When using a gateway version older than `0.7.8` a timeout matching the gateway should be set for the `faas-swarm` or `faas-netes` controller.
62+
63+
```
64+
read_timeout: 25s
65+
write_timeout: 25s
66+
```
67+
68+
### Timeouts - Asynchronous invocations
69+
70+
For asynchronous invocations of functions a separate timeout can be configured at the `queue-worker` level in the `ack_timeout` environmental variable.
71+
72+
If the `ack_timeout` is exceeded the task will not be acknowledge and the queue system will retry the invocation.
73+
74+
## Function execution logs
75+
76+
By default the functions will not log out the result, but just show how long the process took to run and the length of the result in bytes.
77+
78+
```
79+
$ echo test this | faas invoke json-hook -g 127.0.0.1:31112
80+
Received JSON webook. Elements: 10
81+
82+
$ kubectl logs deploy/json-hook -n openfaas-fn
83+
2018/01/28 20:47:21 Writing lock-file to: /tmp/.lock
84+
2018/01/28 20:47:27 Forking fprocess.
85+
2018/01/28 20:47:27 Wrote 35 Bytes - Duration: 0.001844 seconds
86+
```
87+
88+
If you want to see the result of a function in the function's logs then deploy it with the `write_debug` environmental variable set to `true`.
89+
90+
For example:
91+
92+
```
93+
provider:
94+
name: faas
95+
gateway: http://127.0.0.1:8080
96+
97+
functions:
98+
json-hook:
99+
lang: go
100+
handler: ./json-hook
101+
image: json-hook
102+
environment:
103+
write_debug: true
104+
```
105+
106+
Now you'll see logs like this:
107+
108+
```
109+
$ echo test this | faas invoke json-hook -g 127.0.0.1:31112
110+
Received JSON webook. Elements: 10
111+
112+
$ kubectl logs deploy/json-hook -n openfaas-fn
113+
2018/01/28 20:50:27 Writing lock-file to: /tmp/.lock
114+
2018/01/28 20:50:35 Forking fprocess.
115+
2018/01/28 20:50:35 Query
116+
2018/01/28 20:50:35 Path /function/json-hook
117+
Received JSON webook. Elements: 10
118+
2018/01/28 20:50:35 Duration: 0.001857 seconds
119+
```
120+
121+
You can then find the logs of the function using Docker Swarm or Kubernetes as listed in the section below.
122+
123+
## Healthcheck
124+
125+
Most problems reported via GitHub or Slack stem from a configuration problem or issue with a function. Here is a checklist of things you can try before digging deeper:
126+
127+
Checklist:
128+
* [ ] All core services are deployed: i.e. gateway
129+
* [ ] Check functions are deployed and started
130+
* [ ] Check request isn't timing out at the gateway or the function level
131+
132+
## CLI unresponsive - 127.0.0.1 vs. localhost
133+
134+
On certain Linux distributions the name `localhost` maps to an IPv6 alias meaning that the CLI may hang. In these circumstances you have two options:
135+
136+
1. Use the `-g` or `--gateway` argument with `127.0.0.1:8080` or similar
137+
138+
2. Set the `OPENFAAS_URL` environmental variable to `127.0.0.1:8080` or similar
139+
140+
3. Edit the `/etc/hosts` file on your machine and remove the IPv6 alias for localhost (this forces the use of IPv4)
141+
142+
# Troubleshooting Swarm or Kubernetes
143+
144+
## Docker Swarm
145+
146+
### List all functions
147+
148+
```
149+
$ docker service ls
150+
```
151+
152+
You are looking for 1/1 for the replica count of each service listed.
153+
154+
### Find a function's logs
155+
156+
```
157+
$ docker service logs --tail 100 FUNCTION
158+
```
159+
160+
### Find out if a function failed to start
161+
162+
```
163+
$ docker service ps --no-trunc=true FUNCTION
164+
```
165+
166+
### Stop and remove OpenFaaS
167+
168+
```
169+
$ docker stack rm func
170+
```
171+
172+
If you have additional services / functions remove the remaining ones like this:
173+
174+
```
175+
$ docker service ls -q | xargs docker service rm
176+
```
177+
178+
*Use with caution*
179+
180+
## Kubernetes
181+
182+
If you have deployed OpenFaaS to the recommended namespaces then functions are in the `openfaas-fn` namespace and the core services are in the `openfaas` namespace. The `-n` flag to `kubectl` sets the namespace to look at.
183+
184+
### List OpenFaaS services
185+
186+
```
187+
$ kubectl get deploy -n openfaas
188+
```
189+
190+
### List all functions
191+
192+
```
193+
$ kubectl get deploy -n openfaas-fn
194+
```
195+
196+
### Find a function's logs
197+
198+
```
199+
$ kubectl logs -n openfaas-fn deploy/FUNCTION_NAME
200+
```
201+
202+
### Find out if a function failed to start
203+
204+
```
205+
$ kubectl describe -n openfaas-fn deploy/FUNCTION_NAME
206+
```
207+
208+
### Remove the OpenFaaS deployment
209+
210+
From within the `faas-netes` folder:
211+
212+
```
213+
$ kubectl delete -f namespaces.yml,./yaml/
214+
```
215+
216+
# Watchdog
217+
218+
## Debug your function without deploying it
219+
220+
Here's an example of how you can deploy a function without using an orchestrator and the API gateeway. It is especially useful for testing:
221+
222+
```
223+
$ docker run --name debug-alpine \
224+
-p 8081:8080 -ti functions/alpine:latest sh
225+
# fprocess=date fwatchdog &
226+
```
227+
228+
Now you can access the function with one of the supported HTTP methods such as GET/POST etc:
229+
230+
```
231+
$ curl -4 127.0.0.1:8081
232+
```
233+
234+
## Edit your function without rebuilding it
235+
236+
You can bind-mount code straight into your function and work with it locally, until you are ready to re-build. This is a common flow with containers, but should be used sparingly.
237+
238+
Within the CLI directory for instance:
239+
240+
Build the samples:
241+
242+
```
243+
$ git clone https://github.com/openfaas/faas-cli && \
244+
cd faas-cli
245+
$ faas-cli -action build -f ./samples.yml
246+
```
247+
248+
Now work with the Python-hello sample, with the code mounted live:
249+
250+
```
251+
$ docker run -v `pwd`/sample/url-ping/:/root/function/ \
252+
--name debug-alpine -p 8081:8080 -ti alexellis/faas-url-ping sh
253+
$ touch ./function/__init__.py
254+
# fwatchdog
255+
```
256+
257+
Now you can start editing the code in the sample/url-ping folder and it will reload live for every request.
258+
259+
```
260+
$ curl 127.0.0.1:8081 -d "https://www.google.com"
261+
Handle this -> https://www.google.com
262+
https://www.google.com => 200
263+
```
264+
265+
Now you can edit handler.py and you'll see the change immediately:
266+
267+
```
268+
$ echo "def handle(req):" > sample/url-ping/handler.py
269+
$ echo ' print("Nothing to see here")' >> sample/url-ping/handler.py
270+
$ curl 127.0.0.1:8081 -d "https://www.google.com"
271+
Nothing to see here
272+
```

mkdocs.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,9 +110,11 @@ pages:
110110
- Deployment: deployment.md
111111
- Kubernetes: ./deployment/kubernetes.md
112112
- Docker Swarm: ./deployment/docker-swarm.md
113+
- Troubleshooting: ./deployment/troubleshooting.md
113114
- CLI: ./cli/install.md
114115
- Tutorials:
115116
- Workshop: ./tutorials/workshop.md
116117
- Design & Architecture:
117118
- Gateway: ./architecture/gateway.md
118-
- Watchdog: ./architecture/watchdog.md
119+
- Watchdog: ./architecture/watchdog.md
120+
- Autoscaling: ./architecture/autoscaling.md

0 commit comments

Comments
 (0)