Add inital auto-scaling guide & troubleshooting

alexellis · alexellis · commit 28f2865c9bb2 · 2018-03-31T15:17:11.000+01:00
Auto-scaling needs padding out for the HPA section by Stefan.

Signed-off-by: Alex Ellis &lt;alexellis2@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -16,4 +16,4 @@ This page is published through the use of `mkdocs` and is hosted on https://netl
 
 * https://docs.openfaas.com/
 
-All commits into master (or merged PRs) will emerge on the front-page after being rebuilt.
+All commits into master (or merged PRs) will appear on the front-page after being rebuilt.
diff --git a/docs/architecture/autoscaling.md b/docs/architecture/autoscaling.md
@@ -0,0 +1,38 @@
+# Auto-scaling
+
+Auto-scaling in OpenFaaS allows a function to scale up or down depending on demand represented by different metrics.
+
+## Scaling by requests per second
+
+OpenFaaS ships with a single auto-scaling rule defined in the configuration for AlertManager. AlertManager reads usage (requests per second) metrics from Prometheus in order to know when to fire an alert to the API Gateway.
+
+The API Gateway handles AlertManager alerts through its `/system/alert` route.
+
+The auto-scaling provided by this method can be disabled by either deleting the AlertManager deployment or by scaling the deployment to zero replicas.
+
+The AlertManager rules ([alert.rules](https://github.com/openfaas/faas/blob/master/prometheus/alert.rules.yml)) for Swarm can be viewed here and altered as a configuration map.
+
+### Min/max replicas
+
+The minimum (initial) and maximum replica count can be set at deployment time by adding a label to the function.
+
+* `com.openfaas.scale.min`
+
+By default this is set to `1`
+
+* `com.openfaas.scale.max`
+
+The current default value is `20` for 20 replicas
+
+For each alert fired the auto-scaler will add 5 replicas, but we are currently working on making the step configurable as a pre-defined step or proportional percentage. Once an alert is resolved due to a lower load than is needed for scaling the replica count will be scaled to the minimum replica count.
+
+> Note: Active alerts can be viewed in the "Alerts" tab of Prometheus which is deployed with OpenFaaS.
+
+## Scaling by CPU and/or memory utilization
+
+When using Kubernetes the built-in [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) can be used instead of AlertManager.
+
+Find out more in Stefan Prodan's blog post below:
+
+https://stefanprodan.com/2018/kubernetes-scaleway-baremetal-arm-terraform-installer/#horizontal-pod-autoscaling
+
diff --git a/docs/deployment/kubernetes.md b/docs/deployment/kubernetes.md
@@ -196,4 +196,4 @@ Learn how to build serverless functions with OpenFaaS and Python in our half-day
 
 If you are running into any issues please check out the troubleshooting guide and search the documentation / past issues before raising an issue.
 
-* [Troubleshooting guide](https://github.com/openfaas/faas/blob/master/guide/troubleshooting.md)
+* [Troubleshooting guide](https://github.com/openfaas/faas/blob/master/guide/troubleshooting.md)
diff --git a/docs/deployment/troubleshooting.md b/docs/deployment/troubleshooting.md
@@ -0,0 +1,272 @@
+# Troubleshooting guide
+
+## Timeouts
+
+Default timeouts are configured at the HTTP level and must be set both on the gateway and the function.
+
+> Note: all distributed systems need a maximum timeout value to be configured for work. This means that work cannot be unbounded.
+
+### Timeouts - Your function
+
+You can also enforce a hard-timeout for your function with the `hard_timeout` environmental variable.
+
+For watchdog configuration see the [README](https://github.com/openfaas/faas/tree/master/watchdog).
+
+The best way to set the timeout is in the YAML file generated by the `faas-cli`.
+
+Example Go app that sleeps for (10 seconds):
+
+```
+provider:
+  name: faas
+  gateway: http://127.0.0.1:8080
+
+functions:
+  sleepygo:
+    lang: go
+    handler: ./sleepygo
+    image: alexellis2/sleeps-for-10-seconds
+    environment:
+        read_timeout: 20s
+        write_timeout: 20s
+```
+
+handler.go
+
+```
+package function
+
+...
+
+func Handle(req []byte) string {
+	time.Sleep(time.Second * 10)
+	return fmt.Sprintf("Hello, Go. You said: %s", string(req))
+}
+```
+
+### Timeouts - Gateway
+
+For the gateway set the following environmental variables:
+
+```
+            read_timeout:  "25s"        # Maximum time to read HTTP request
+            write_timeout: "25s"        # Maximum time to write HTTP response
+            upstream_timeout: "20s"     # Maximum duration of upstream function call
+```
+
+> Note: The value for `upstream_timeout` should be slightly less than `read_timeout` and `write_timeout`
+
+### Timeouts - Function provider
+
+When using a gateway version older than `0.7.8` a timeout matching the gateway should be set for the `faas-swarm` or `faas-netes` controller.
+
+```
+read_timeout: 25s
+write_timeout: 25s
+```
+
+### Timeouts - Asynchronous invocations
+
+For asynchronous invocations of functions a separate timeout can be configured at the `queue-worker` level in the `ack_timeout` environmental variable.
+
+If the `ack_timeout` is exceeded the task will not be acknowledge and the queue system will retry the invocation.
+
+## Function execution logs
+
+By default the functions will not log out the result, but just show how long the process took to run and the length of the result in bytes.
+
+```
+$ echo test this | faas invoke json-hook -g 127.0.0.1:31112
+Received JSON webook. Elements: 10
+
+$ kubectl logs deploy/json-hook -n openfaas-fn
+2018/01/28 20:47:21 Writing lock-file to: /tmp/.lock
+2018/01/28 20:47:27 Forking fprocess.
+2018/01/28 20:47:27 Wrote 35 Bytes - Duration: 0.001844 seconds
+```
+
+If you want to see the result of a function in the function's logs then deploy it with the `write_debug` environmental variable set to `true`.
+
+For example:
+
+```
+provider:
+  name: faas
+  gateway: http://127.0.0.1:8080
+
+functions:
+  json-hook:
+    lang: go
+    handler: ./json-hook
+    image: json-hook
+    environment:
+      write_debug: true
+```
+
+Now you'll see logs like this:
+
+```
+$ echo test this | faas invoke json-hook -g 127.0.0.1:31112
+Received JSON webook. Elements: 10
+
+$ kubectl logs deploy/json-hook -n openfaas-fn
+2018/01/28 20:50:27 Writing lock-file to: /tmp/.lock
+2018/01/28 20:50:35 Forking fprocess.
+2018/01/28 20:50:35 Query  
+2018/01/28 20:50:35 Path  /function/json-hook
+Received JSON webook. Elements: 10
+2018/01/28 20:50:35 Duration: 0.001857 seconds
+```
+
+You can then find the logs of the function using Docker Swarm or Kubernetes as listed in the section below.
+
+## Healthcheck
+
+Most problems reported via GitHub or Slack stem from a configuration problem or issue with a function. Here is a checklist of things you can try before digging deeper:
+
+Checklist:
+* [ ] All core services are deployed: i.e. gateway
+* [ ] Check functions are deployed and started
+* [ ] Check request isn't timing out at the gateway or the function level
+
+## CLI unresponsive - 127.0.0.1 vs. localhost
+
+On certain Linux distributions the name `localhost` maps to an IPv6 alias meaning that the CLI may hang. In these circumstances you have two options:
+
+1. Use the `-g` or `--gateway` argument with `127.0.0.1:8080` or similar
+
+2. Set the `OPENFAAS_URL` environmental variable to `127.0.0.1:8080` or similar
+
+3. Edit the `/etc/hosts` file on your machine and remove the IPv6 alias for localhost (this forces the use of IPv4)
+
+# Troubleshooting Swarm or Kubernetes
+
+## Docker Swarm
+
+### List all functions
+
+```
+$ docker service ls
+```
+
+You are looking for 1/1 for the replica count of each service listed.
+
+### Find a function's logs
+
+```
+$ docker service logs --tail 100 FUNCTION
+```
+
+### Find out if a function failed to start
+
+```
+$ docker service ps --no-trunc=true FUNCTION
+```
+
+### Stop and remove OpenFaaS
+
+```
+$ docker stack rm func
+```
+
+If you have additional services / functions remove the remaining ones like this:
+
+```
+$ docker service ls -q | xargs docker service rm
+```
+
+*Use with caution*
+
+## Kubernetes
+
+If you have deployed OpenFaaS to the recommended namespaces then functions are in the `openfaas-fn` namespace and the core services are in the `openfaas` namespace. The `-n` flag to `kubectl` sets the namespace to look at.
+
+### List OpenFaaS services
+
+```
+$ kubectl get deploy -n openfaas
+```
+
+### List all functions
+
+```
+$ kubectl get deploy -n openfaas-fn
+```
+
+### Find a function's logs
+
+```
+$ kubectl logs -n openfaas-fn deploy/FUNCTION_NAME
+```
+
+### Find out if a function failed to start
+
+```
+$ kubectl describe -n openfaas-fn deploy/FUNCTION_NAME
+```
+
+### Remove the OpenFaaS deployment
+
+From within the `faas-netes` folder:
+
+```
+$ kubectl delete -f namespaces.yml,./yaml/
+```
+
+# Watchdog
+
+## Debug your function without deploying it
+
+Here's an example of how you can deploy a function without using an orchestrator and the API gateeway. It is especially useful for testing:
+
+```
+$ docker run --name debug-alpine \
+  -p 8081:8080 -ti functions/alpine:latest sh
+# fprocess=date fwatchdog &
+```
+
+Now you can access the function with one of the supported HTTP methods such as GET/POST etc:
+
+```
+$ curl -4 127.0.0.1:8081
+```
+
+## Edit your function without rebuilding it
+
+You can bind-mount code straight into your function and work with it locally, until you are ready to re-build. This is a common flow with containers, but should be used sparingly.
+
+Within the CLI directory for instance:
+
+Build the samples:
+
+```
+$ git clone https://github.com/openfaas/faas-cli && \
+  cd faas-cli
+$ faas-cli -action build -f ./samples.yml
+```
+
+Now work with the Python-hello sample, with the code mounted live:
+
+```
+$ docker run -v `pwd`/sample/url-ping/:/root/function/ \
+  --name debug-alpine -p 8081:8080 -ti alexellis/faas-url-ping sh
+$ touch ./function/__init__.py
+# fwatchdog
+```
+
+Now you can start editing the code in the sample/url-ping folder and it will reload live for every request.
+
+```
+$ curl 127.0.0.1:8081 -d "https://www.google.com"
+Handle this -> https://www.google.com
+https://www.google.com => 200
+```
+
+Now you can edit handler.py and you'll see the change immediately:
+
+```
+$ echo "def handle(req):" > sample/url-ping/handler.py
+$ echo '    print("Nothing to see here")' >> sample/url-ping/handler.py
+$ curl 127.0.0.1:8081 -d "https://www.google.com"
+Nothing to see here
+```
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -110,9 +110,11 @@ pages:
     - Deployment: deployment.md
     - Kubernetes: ./deployment/kubernetes.md
     - Docker Swarm: ./deployment/docker-swarm.md
+    - Troubleshooting: ./deployment/troubleshooting.md
   - CLI: ./cli/install.md
   - Tutorials:
     - Workshop: ./tutorials/workshop.md
   - Design & Architecture:
     - Gateway: ./architecture/gateway.md
-    - Watchdog: ./architecture/watchdog.md
+    - Watchdog: ./architecture/watchdog.md
+    - Autoscaling: ./architecture/autoscaling.md

Original file line number	Diff line number	Diff line change
@@ -16,4 +16,4 @@ This page is published through the use of `mkdocs` and is hosted on https://netl
`16`	`16`
`17`	`17`	`* https://docs.openfaas.com/`
`18`	`18`
`19`		`-All commits into master (or merged PRs) will emerge on the front-page after being rebuilt.`
	`19`	`+All commits into master (or merged PRs) will appear on the front-page after being rebuilt.`
Original file line number	Diff line number	Diff line change
`@@ -196,4 +196,4 @@ Learn how to build serverless functions with OpenFaaS and Python in our half-day`
`196`	`196`
`197`	`197`	`If you are running into any issues please check out the troubleshooting guide and search the documentation / past issues before raising an issue.`
`198`	`198`
`199`		`-* [Troubleshooting guide](https://github.com/openfaas/faas/blob/master/guide/troubleshooting.md)`
	`199`	`+* [Troubleshooting guide](https://github.com/openfaas/faas/blob/master/guide/troubleshooting.md)`