Skip to content

Commit 7bf89a1

Browse files
committed
Tutorial: Doorman + GKE.
1 parent fd4bf08 commit 7bf89a1

29 files changed

+950
-0
lines changed

doc/loadtest/README.md

Lines changed: 374 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,374 @@
1+
# GKE Load Test Tutorial
2+
3+
In this tutorial, we are going to set up a Doorman deployment similar to what you may expect to run in a production environment. The resource that Doorman will be protecting isn't all that useful (it's one of the Go examples for gRPC, the [Greeter](https://github.com/grpc/grpc-go/blob/master/examples/helloworld/helloworld/helloworld.proto) service), but that doesn't change the fact it's a real RPC service that may have limited capacity. Finally, we'll add some monitoring for good measure.
4+
5+
To make this slightly more manageable, we'll do all of this in a Kubernetes cluster running on GKE (Google Container Engine). It should be simple to replicate the experiment with a Kubernetes cluster running on your own machines, and relatively easy to replicate it using some other cloud setup.
6+
7+
## Dramatis Personae
8+
9+
Our deployment will consist of the following elements:
10+
11+
- Doorman Server - the standard Doorman server.
12+
- `target` - an RPC server.
13+
- `client` - a client for `target` which uses Doorman to avoid overloading it.
14+
- [Prometheus](http://prometheus.io/) - a monitoring system. We'll use it to get insight into the running system.
15+
16+
`target` and `client` are custom written for this tutorial. Let's take a closer look at them.
17+
18+
![Overview](overview.png)
19+
20+
### `target`
21+
22+
[Target](docker/target/target.go) is an extremely simple gRPC server. Here is its `main` function:
23+
```go
24+
func main() {
25+
flag.Parse()
26+
lis, err := net.Listen("tcp", fmt.Sprintf(":%v", *port))
27+
if err != nil {
28+
log.Exitf("failed to listen: %v", err)
29+
}
30+
31+
http.Handle("/metrics", prometheus.Handler())
32+
go http.ListenAndServe(fmt.Sprintf(":%v", *debugPort), nil)
33+
s := grpc.NewServer()
34+
pb.RegisterGreeterServer(s, &server{})
35+
s.Serve(lis)
36+
}
37+
```
38+
39+
We listen on two ports: one for gRPC, the other for HTTP, which we will use for monitoring.
40+
41+
`server` is similarly unexciting:
42+
43+
```go
44+
// server is used to implement helloworld.GreeterServer.
45+
type server struct{}
46+
47+
// SayHello implements helloworld.GreeterServer
48+
func (s *server) SayHello(ctx context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
49+
requests.WithLabelValues(in.Name).Inc()
50+
return &pb.HelloReply{Message: "Hello " + in.Name}, nil
51+
}
52+
```
53+
54+
A last thing worth noting is requests, which is a Prometheus counter. We will use it to monitor the number of requests that `target` is actually getting.
55+
56+
### `client`
57+
58+
`client` has one task: Send RPCs to `target`. Each process simulates some number of Doorman clients. This is necessary to make scheduling on a small Kubernetes cluster easier. In a real world setting, you would usually have one Doorman client per process.
59+
60+
```go
61+
func main() {
62+
flag.Parse()
63+
log.Infof("Simulating %v clients.", *count)
64+
for i := 0; i < *count; i++ {
65+
id := uuid.New()
66+
log.Infof("client %v with id %v", i, id)
67+
68+
client, err := doorman.NewWithID(*addr, id, doorman.DialOpts(grpc.WithInsecure()))
69+
if err != nil {
70+
log.Exit(err)
71+
}
72+
defer client.Close()
73+
74+
res, err := client.Resource(*resource, *initialCapacity)
75+
if err != nil {
76+
log.Exit(err)
77+
}
78+
79+
go manipulateCapacity(res, *initialCapacity, id)
80+
81+
conn, err := grpc.Dial(*target, grpc.WithInsecure())
82+
if err != nil {
83+
log.Exitf("did not connect: %v", err)
84+
}
85+
defer conn.Close()
86+
87+
c := pb.NewGreeterClient(conn)
88+
rl := ratelimiter.NewQPS(res)
89+
90+
for i := 0; i < *workers; i++ {
91+
go func() {
92+
ctx := context.Background()
93+
for {
94+
if err := rl.Wait(ctx); err != nil {
95+
log.Exitf("rl.Wait: %v", err)
96+
}
97+
98+
ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
99+
if _, err := c.SayHello(ctx, &pb.HelloRequest{Name: *resource}); err != nil {
100+
log.Error(err)
101+
}
102+
cancel()
103+
}
104+
}()
105+
}
106+
}
107+
http.Handle("/metrics", prometheus.Handler())
108+
http.ListenAndServe(fmt.Sprintf(":%v", *port), nil)
109+
}
110+
111+
```
112+
113+
The client uses a Doorman rate limiter and when the `Wait` method returns, performs the RPC.
114+
115+
The function `manipulateCapacity` changes the capacity requested by a client in a random way:
116+
117+
```go
118+
func manipulateCapacity(res doorman.Resource, current float64, id string) {
119+
clientRequested := new(expvar.Float)
120+
for range time.Tick(*interval) {
121+
r := rand.Float64()
122+
log.V(2).Infof("r=%v decreaseChance=%v increaseChance=%v", r, *decreaseChance, *increaseChance)
123+
switch {
124+
case r < *decreaseChance:
125+
current -= *step
126+
log.Infof("client %v will request less: %v.", id, current)
127+
case r < *decreaseChance+*increaseChance:
128+
log.Infof("client %v will request more: %v.", id, current)
129+
current += *step
130+
default:
131+
log.V(2).Infof("client %v not changing requested capacity", id)
132+
continue
133+
}
134+
if current > *maxCapacity {
135+
current = *maxCapacity
136+
}
137+
if current < *minCapacity {
138+
current = *minCapacity
139+
}
140+
log.Infof("client %v will request %v", id, current)
141+
if err := res.Ask(current); err != nil {
142+
log.Errorf("res.Ask(%v): %v", current, err)
143+
continue
144+
}
145+
clientRequested.Set(current)
146+
requested.Set(id, clientRequested)
147+
}
148+
}
149+
```
150+
151+
Again, we are exposing an HTTP port and exporting metrics (we will use them to find the client latencies).
152+
153+
### Doorman Server
154+
155+
This is the the regular [doorman server](https://github.com/youtube/doorman/tree/master/go/cmd/doorman), whose address we will give to the client. The way we'll run the server differs significantly from how we would run it in a real world setting. We are running just one process. If it dies, the client won't be able to get new resource leases. For production, we would run 3 processes, and they would use [etcd](https://github.com/coreos/etcd/) to elect a leader among themselves. We are skipping this step for the sake of simplicity.
156+
157+
## Kubernetes
158+
159+
[Kubernetes](http://kubernetes.io/) allows you to run Docker instances in a cluster. The great part about it is that it allows you to view all your containers as a single system. If you want to learn more about Kubernetes, please take a look at its [documentation](http://kubernetes.io/v1.1/examples/guestbook/README.html).
160+
### Kubernetes in less than a minute
161+
162+
Assuming that you have some idea what Kubernetes is about, here's a quick refresher.
163+
164+
- All processes run in Linux containers (Docker being the most popular container solution).
165+
- A *pod* is a group of containers that get scheduled together.
166+
- A *replication controller* makes sure that a specified number of replicas of some pod are running at any given point. It is important to remember that pods running under a replication controller are [cattle, not pets](http://www.theregister.co.uk/2013/03/18/servers_pets_or_cattle_cern/). You are not supposed to concern yourself with a single pod. It may get killed, rescheduled, etc. They have names, but they are randomly generated, and you refer to them mostly when debugging an issue.
167+
- A *service* abstracts away the problem of referring to pods. You specify some constrains that the pods have to meet, and the service gives you a port which you can use to connect to a pod of the specified type. It also acts as a load balancer.
168+
169+
## Creating a cluster
170+
171+
This part of the tutorial is specific for GKE. You should be able to adapt it reasonably well for [AWS](http://kubernetes.io/v1.1/docs/getting-started-guides/aws.html) or [Azure](http://kubernetes.io/v1.1/docs/getting-started-guides/coreos/azure/README.html)
172+
173+
I am assuming that you've installed [gcloud CLI](https://cloud.google.com/container-engine/docs/before-you-begin#install_the_gcloud_command_line_interface), and you have a Cloud project set up. You should also do
174+
175+
```console
176+
$ gcloud config set project PROJECT_ID
177+
$ gcloud config set compute/zone us-central1-b
178+
```
179+
180+
to save yourself from some typing.
181+
182+
Let's create a cluster. Run something like
183+
```console
184+
$ gcloud container clusters create doorman-loadtest --machine-type=n1-standard-1 --num-nodes=6
185+
```
186+
187+
depending on how big you want your toy cluster to be.
188+
189+
## Docker images
190+
191+
Now, let's create the Docker images that we will use to run our services. I am assuming that you are in Doorman's main directory.
192+
193+
```console
194+
$ docker build -t gcr.io/google.com/doorman/doorman-server:v0.1.7 doc/loadtest/docker/server/
195+
$ docker build -t gcr.io/google.com/doorman/doorman-client:v0.1.7 doc/loadtest/docker/server/
196+
$ docker build -t gcr.io/google.com/doorman/target:v0.1 doc/loadtest/docker/server/
197+
$ docker build -t gcr.io/google.com/doorman/prometheus:v0.2 doc/loadtest/docker/prometheus
198+
```
199+
200+
Now, we can push them to the Docker registry:
201+
202+
```console
203+
$ gcloud docker push gcr.io/google.com/doorman/doorman-server:v0.1.7
204+
$ gcloud docker push gcr.io/google.com/doorman/doorman-client:v0.1.7
205+
$ gcloud docker push gcr.io/google.com/doorman/target:v0.1
206+
$ gcloud docker push gcr.io/google.com/doorman/prometheus:v0.2
207+
```
208+
209+
You will have to replace `google.com/doorman` with your project name, of course, and the image tags if you wish to use a different container registry.
210+
211+
## Populating the cluster
212+
213+
### Doorman
214+
Now, time for the fun part: putting our containers into the cloud! First we'll create a replication controller for the doorman server. We want only one replica, but we need it to be restarted in case something happens. Please take a look at its code in [doorman-server.yaml](doorman-server.yaml).
215+
216+
```console
217+
$ kubectl create -f doc/loadtest/doorman-server.yaml
218+
replicationcontroller "doorman-server" created
219+
```
220+
221+
After a moment, you will see it's been created and it's running
222+
```
223+
$ kubectl get pods
224+
NAME READY STATUS RESTARTS AGE
225+
doorman-server-le54r 1/1 Running 0 15s
226+
```
227+
228+
This is running the Doorman server with a command line like this:
229+
230+
```console
231+
doorman -logtostderr -port=3667 -debug_port=3668 -config=./config.prototext
232+
```
233+
234+
Let's take a look at its logs to verify everything is fine:
235+
236+
```console
237+
$ kubectl logs doorman-server-le54r
238+
I0226 15:48:33.352541 1 doorman_server.go:234] Waiting for the server to be configured...
239+
I0226 15:48:33.352618 1 doorman_server.go:238] Server is configured, ready to go!
240+
I0226 15:48:33.352801 1 server.go:437] this Doorman server is now the master
241+
I0226 15:48:33.352818 1 server.go:457] setting current master to 'doorman-server-le54r:3667'
242+
```
243+
244+
(Your pod identifier will of course be different.)
245+
246+
We can also take a look at the status page of the server. First, we need to run
247+
248+
```console
249+
kubectl port-forward doorman-server-le54r 3668 &
250+
```
251+
252+
to forward our 3668 port to the same container's port. Now we can go to http://localhost:3668/debug/status, and see something like
253+
254+
![doorman's /debug/status](empty-debug-status.png)
255+
256+
Before we forget about it, let's also create a service, that will make our server discoverable for the clients:
257+
258+
```console
259+
$ kubectl create -f doc/loadtest/doorman-server-service.yaml
260+
```
261+
262+
### Prometheus
263+
Let's not forget about
264+
265+
```console
266+
$ kubectl create -f doc/loadtest/prometheus.yaml
267+
```
268+
and quickly verify that it's running. Forward its port:
269+
270+
```console
271+
kubectl port-forward prometheus-mtka5 9090 &
272+
```
273+
274+
And go to http://localhost:9090/graph to verify it's running.
275+
276+
### Target
277+
278+
Now, it's time for the target.
279+
```console
280+
$ kubectl create -f doc/loadtest/target.yaml
281+
replicationcontroller "target" created
282+
$ kubectl create -f doc/loadtest/target-service.yaml
283+
service "target" created
284+
```
285+
286+
Let's verify it's running:
287+
288+
```console
289+
$ kubectl get pods -l app=target
290+
NAME READY STATUS RESTARTS AGE
291+
target-4ivl7 1/1 Running 0 1m
292+
```
293+
294+
### Clients
295+
296+
Now, the key, final element of our puzzle: the client. Let's bring it up:
297+
298+
```console
299+
$ kubectl create -f doc/loadtest/doorman-client.yaml
300+
$ kubectl create -f doc/loadtest/doorman-client-service.yaml
301+
```
302+
303+
This creates 10 replicas of `doorman-client`. Each replica is running the following command line:
304+
305+
```console
306+
client -port=80 --logtostderr \
307+
-count=100
308+
-resource=proportional -initial_capacity=15 -min_capacity=5 \
309+
-max_capacity=2000 -increase_chance=0.1 -decrease_chance=0.05 -step=5 \
310+
-addr=$(DOORMAN_SERVICE_HOST):$(DOORMAN_SERVICE_PORT_GRPC) \
311+
-target=$(TARGET_SERVICE_HOST):$(TARGET_SERVICE_PORT_GRPC)
312+
313+
```
314+
315+
This means that every process creates a 100 Doorman clients, which access the resource `proportional`. The initial capacity will be `15`, and it will fluctuate between `5` and `2000`, with a 10% chance of increasing, and 5% chance of decreasing. Note we get both Doorman's and target's address (`-addr` and `-target`) from the environment. This is one of the ways that Kubernetes enables [discovery](https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/services.md#discovering-services).
316+
317+
## Looking around
318+
319+
So, now that everything is running let's take a small tour of the neighborhood. First, let's look at the Doorman server.
320+
321+
Make sure that fowarding port `3668` is still working, and go to http://localhost:3668/debug/status.
322+
323+
![/debug/status with some clients](status-with-clients.png)
324+
325+
Now it's a lot more interesting. You can see that there's 1000 clients, and all the capacity has been assigned.
326+
327+
Another place where you may want to take a look is http://localhost:3668/debug/requests. This allows you to get a sample of the requests handled by this gRPC server, with information about the source, timing, and received arguments. This is an invaluable tool for debugging.
328+
329+
![/debug/requests](debug-requests.png)
330+
331+
After that let's take a look at `target`. To do that, we'll use Prometheus' expression browser. In production, you'd probably want to be slightly more fancy and have consoles and dashboards, but for our purposes the browser should be enough.
332+
333+
Input the following expression:
334+
335+
```
336+
rate(requests[5m])
337+
```
338+
339+
If you remember from `target.go`, `requests` is a metric counting any requests handled by `target`. The above expression calculates the rate over 5 minutes. We can see in the graph that we are doing 5000 requests per second:
340+
341+
![rate(requests[5m])](requests-rate.png)
342+
343+
Other interesting queries to run:
344+
345+
How many requests is the doorman server receiving?
346+
```
347+
rate(doorman_server_requests[5m])
348+
```
349+
350+
What's the avereage latency for a doorman client request?
351+
```
352+
sum(rate(doorman_client_request_durations_sum[5m])) by (job) / sum(rate(doorman_client_request_durations_count[5m])) by (job)
353+
```
354+
![client latency](client-latency.png)
355+
356+
## What to do next
357+
358+
### Scale!
359+
360+
Stir things up a bit. Add more clients! A lot more clients, say, instruct the client replication controller to maintain 100 replicas.
361+
362+
```console
363+
$ kubectl scale --replicas=100 replicationcontrollers doorman-client-proportional
364+
```
365+
366+
What happens with the number of requests the server is doing? How about the QPS that `target` is receiving? Is it behaving the way you expected? (Hint: if your cluster is small, and there's many clients, they will eventuall become starved for resources, and not be able to use all the capacity they got. Take a look at the [adaptive rate limiter](https://godoc.org/github.com/youtube/doorman/go/ratelimiter#AdaptiveQPS) for a workaround.) How about the client latencies? Can you make them better by giving Doorman more CPU?
367+
368+
### Different Algorithms
369+
370+
Experiment with different capacity distribution algorithms. Edit [`config.protext`](docker/server/config.prototext) to use the [FAIR_SHARE](../algorithms.md#fair_share) algorithm. Does it have any effect on the metrics?
371+
372+
### High Availability
373+
374+
Make the Doorman server highly available. Add an etcd instance (or cluster) to the Kubernetes cluster, increase the number of replicas in [doorman-server.yaml](doorman-server.yaml), and configure them to do a leader election. (Hint: Use the `-etcd_endpoints` and `-master_election_lock` flags.)

doc/loadtest/client-latency.png

77.3 KB
Loading

doc/loadtest/debug-requests.png

465 KB
Loading
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
FROM golang:1.5
2+
RUN go get "github.com/golang/glog" "github.com/pborman/uuid" "github.com/youtube/doorman/go/client/doorman" \
3+
golang.org/x/net/context google.golang.org/grpc google.golang.org/grpc/examples/helloworld/helloworld \
4+
github.com/youtube/doorman/go/ratelimiter
5+
6+
RUN mkdir -p $GOPATH/src/github.com/youtube/doorman/doc/loadtest/docker/client/doorman_client
7+
ADD doorman_client.go $GOPATH/src/github.com/youtube/doorman/doc/loadtest/docker/client
8+
RUN cd $GOPATH/src/github.com/youtube/doorman/doc/loadtest/docker/client && go install
9+

doc/loadtest/docker/client/client

10.2 MB
Binary file not shown.

0 commit comments

Comments
 (0)