Skip to content

Commit 925f4ee

Browse files
author
charlyF
committed
adding monitoring section
1 parent c6213b3 commit 925f4ee

File tree

11 files changed

+177
-45
lines changed

11 files changed

+177
-45
lines changed

02-path-working-with-clusters/201-cluster-monitoring/readme.adoc

Lines changed: 177 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@ This chapter will demonstrate how to monitor a Kubernetes cluster using the foll
1414
Datadog is a monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.
1515
It gives a unified view of an entire stack, allowing to seamlessly monitor metrics, application traces as well as logs.
1616

17-
18-
1917
== Prerequisites
2018

2119
In order to perform exercises in this chapter, you'll need to deploy configurations to an EKS cluster. To create an EKS cluster, use the link:../../01-path-basics/102-your-first-cluster#create-a-kubernetes-cluster-with-eks[AWS CLI] (recommended), or alternatively, link:../../01-path-basics/102-your-first-cluster#alternative-create-a-kubernetes-cluster-with-kops[kops].
@@ -31,12 +29,13 @@ From cloud providers like AWS, Google Cloud or Azure to tools like chef, puppet
3129
Databases like Postegres, Mysql. Webservers like NGINX, HAProxy and so on and so forth.
3230

3331
Today, we will be using:
34-
- Kubernetes
35-
- Docker
36-
- NGINX
37-
- MongoDB
38-
- Redis
39-
- Python
32+
33+
* Kubernetes
34+
* Docker
35+
* NGINX
36+
* MongoDB
37+
* Redis
38+
* Python
4039

4140
There are multiple ways to collect data - The first one is via our https://github.com/DataDog/datadog-agent[agent].
4241
We will deploy the agent on all the nodes of our EKS cluster. It will run as a pod, along side our application.
@@ -90,9 +89,11 @@ Insert a Datadog API Key that can be found in your https://app.datadoghq.com/acc
9089

9190
Then from the current directory, just run:
9291

93-
$ kubectl apply -f templates/datadog/agent.yaml
94-
95-
TODO add output
92+
```
93+
$ kubectl apply -f templates/datadog/agent.yaml
94+
daemonset.extensions "dd-agent" created
95+
service "dd-agent" created
96+
```
9697

9798
As this manifest is a DaemonSet, this will deploy an agent on all your nodes. The agent will live inside a pod.
9899

@@ -103,15 +104,18 @@ To set up the MongoDB replica set, you need three things: A StorageClass, a Head
103104
We will start by creating a StorageClass to tell Kubernetes what kind of storage to use for the database nodes.
104105
In this case, we will rely on EBS GP2s to store our data.
105106

106-
$ kubectl apply -f templates/mongodb/storageclass.yaml
107-
108-
TODO add output
107+
```
108+
$ kubectl apply -f templates/mongodb/storageclass.yaml
109+
storageclass.storage.k8s.io "fast" created
110+
```
109111

110112
Once the storage is ready, we can spin up our MongoDB with 3 replicas.
111113

112-
$ kubectl apply -f templates/mongodb/mongo.yaml
113-
114-
TODO add output
114+
```
115+
$ kubectl apply -f templates/mongodb/mongo.yaml
116+
service "mongo" created
117+
statefulset.apps "mongo" created
118+
```
115119

116120
Note that this will create a service which will operate as a headless loadbalancer in front of the DBs.
117121
This will also generate Persistent Volume Claims, these should appear as EBS volumes in your AWS account.
@@ -122,22 +126,37 @@ You can run the following command:
122126

123127
$ kubectl exec -it mongo-1 -- sh -c 'mongo admin --host localhost --eval "db.createUser({ user: \"datadog\", pwd: \"tndPhL3wrMEDuj4wLEHmbxbV\", roles: [ {role: \"read\", db: \"admin\"}, {role: \"clusterMonitor\", db:\"admin\"},{role: \"read\", db: \"local\" } ] });"'
124128

125-
=== The cache
129+
Double check that the persistent volumes were appropriately affected:
126130

127-
We will be leveraging Redis to cache data.
128-
TODO more details about Redis
131+
```
132+
$ kubectl get pvc
133+
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
134+
mongo-persistent-storage-mongo-0 Bound pvc-ec5ccee5-8307-11e8-b84c-06bfcd83c358 1Gi RWO fast 3m
135+
mongo-persistent-storage-mongo-1 Bound pvc-f3dd1eae-8307-11e8-b84c-06bfcd83c358 1Gi RWO fast 3m
136+
mongo-persistent-storage-mongo-2 Bound pvc-fffcea2a-8307-11e8-b84c-06bfcd83c358 1Gi RWO fast 3m
137+
```
129138

130-
You can run
139+
=== The cache
131140

132-
$ kubectl apply -f templates/redis/redis.yaml
141+
We will be leveraging Redis to cache data.
133142

134-
Which will create a redis pod and a headless service in front of it
143+
Create your Redis cache:
144+
```
145+
$ kubectl apply -f templates/redis/redis.yaml
146+
deployment.apps "redis" created
147+
service "redis" created
148+
```
149+
Which will create a redis pod and a headless service in front of it.
135150

136151
=== Deploy the application
137152

138153
Now is time to deploy your application.
139154

140-
$ kubectl apply -f templates/webapp/webapp.yaml
155+
```
156+
$ kubectl apply -f templates/webapp/webapp.yaml
157+
deployment.apps "fan" created
158+
service "fan" created
159+
```
141160

142161
This will create a pod running the application as well as a service in front of it.
143162

@@ -150,15 +169,18 @@ Now is time to see the result of your labor.
150169
Spin up the nginx manifest, this will create a webserver that will front the application as well as a service.
151170
The service, as opposed to the above services is configured to be a LoadBalancer. Therefore, it will spin up an ELB and will make a public DNS that will be exposed to the world.
152171

153-
$ kubectl apply -f templates/nginx/nginx.yaml
154-
172+
```
173+
$ kubectl apply -f templates/nginx/nginx.yaml
174+
daemonset.extensions "nginx" created
175+
service "nginx-deployment" created
176+
configmap "nginxconfig" created
177+
```
155178
This will also create a ConfigMap used to store the nginx config as an ETCD object instead of a physical file. The benefit is that the file does not have to be present on each node.
156179

157180
Now, take a look at your LoadBalancer being configured:
158181

159-
$ kubectl describe svc nginx-deployment
160-
161182
```
183+
$ kubectl describe svc nginx-deployment
162184
Name: nginx-deployment
163185
Namespace: default
164186
Labels: <none>
@@ -187,51 +209,161 @@ image::webapp.png[]
187209

188210
== Monitoring
189211

212+
=== Diving in the data
213+
214+
In order to start monitoring, start by visualizing the data at a high level. The Datadog hostmap gives a birds-eye view of your infrastructure.
215+
Go on the https://app.datadoghq.com/infrastructure/map[hostmap] to see your EKS cluster.
216+
217+
image::hostmap.png[]
218+
219+
As we are using Kubernetes, our infrastrcture is containers driven - Therefore, the containers map will give us more details on the containers running on each host.
220+
221+
You can easily switch back and forth with the toggle on the top left hand corner.
222+
223+
image::container-map.png[]
224+
225+
While having a cluster wide overview at the container level, it is even better to visualize the activity on a per container/pod basis.
226+
You can achieve this by going to the https://app.datadoghq.com/containers[Container Live view]
227+
228+
image::container-view.png[]
229+
230+
Go to the https://app.datadoghq.com/process[Processes page] to you visualize the processes running on the monitored host.
231+
190232
=== Metrics
191233

192-
Open the host map, go to the container map
193-
You can open the container live view
234+
The agent is collecting the metrics from containers via the https://docs.datadoghq.com/videos/autodiscovery/[Autodiscovery process].
235+
It works with Annotations in this case. You can see in the mongo, redis or nginx manifests this template:
236+
```
237+
metadata:
238+
annotations:
239+
ad.datadoghq.com/redis.check_names: '["redisdb"]'
240+
ad.datadoghq.com/redis.init_configs: '[{}]'
241+
ad.datadoghq.com/redis.instances: '[{"host": "%%host%%","port":"6379"}]'
242+
```
243+
244+
Exec in one of the agents and run the status command to see what are the checks run:
245+
246+
$ kubectl get pods -l app=dd-agent
194247

195-
Then, go the the redis dashboard and mongo db ?
248+
Pick one of the pods and run
196249

197-
The agent is collecting the metrics from these via the Autodiscovery process.
198-
It works with Annotations in this case.
250+
$ kubectl exec -ti <pod_name> agent status
199251

252+
You should see the mongo check being run, as well as other checks (depending on the pods running on the node).
200253

201-
=== Logs
254+
=== From Metrics to Logs
202255

203256
Let's stress the cache of our app and see the logs.
204257

205-
Go on to the redis metric that surges, click to see the related logs.
206-
We can also see logs about mongo, redis, the app.
258+
Open your Web app and click on the Caching demo, run it and go to your Datadog app.
207259

208-
=== Traces
260+
This demo will stress redis by querying elements in the cache. It will subsequently submit logs and traces.
209261

210-
Now, let's run the infinite demo.
211-
from the logs, let's look at the traces.
262+
Go to the https://app.datadoghq.com/screen/integration/15/redis---overview[Redis Dashboard] - It was made out of the box for you as an agent autodiscovered the Redis pod.
263+
You will see a surge in the command per seconds, click on the metric and View Related Logs
212264

213-
At this point, you can stop the infinite demo.
265+
image::redis-dashboard.png[]
214266

215-
We recommend letting the agents up, as the next steps of the workshop will also have a monitoring section.
267+
This will take you to the https://app.datadoghq.com/logs[Log Explorer] page, carrying the context of the source (here redis) and the time window.
268+
269+
image::redis-logs.png[]
270+
271+
If you click on one of the logs, you will be able to see the details of this log.
272+
273+
=== From Logs to Traces
274+
275+
Now that we have identified the logs that were submitted at the moment of the surge in the number of commands per second, we can look at the relevant traces that our application submitted.
276+
277+
Click on one of the redis logs, and on `Service: Redis` click on See in APM:
278+
279+
image::go-to-redis-traces.png[]
280+
281+
From there you can navigate to the traces that correspond to this service. Clicking on the GET resource we can see the Total number of requests, the errors as well as the latency.
282+
Now, we can also click on a single trace and see the actual flame graph:
283+
284+
image::redis-traces.png[]
216285

217286
=== Setting up some monitors
218287

288+
Before doing some further testing, let's create a few monitors. Go to the https://app.datadoghq.com/monitors#/create[Monitor section] of your Datadog Application.
289+
219290
* Monitoring the Infrastructure
220291

292+
Create a https://app.datadoghq.com/monitors#create/metric[metric monitor] for the memory used by pod - you can pick the metric and set the scope, We recommend using the following query:
293+
294+
`avg:kubernetes.memory.usage{cluster:eks} by {pod_name}`
295+
296+
Set a threshold at 160M
297+
298+
In the `Say what's happening` section, you can describe the issue and use template variables to give more context:
299+
```
300+
Memory over {{threshold}} for {{pod_name.name}}.
301+
```
302+
221303
* Monitoring the DB
222304

305+
Create a https://app.datadoghq.com/monitors#create/forecast[Forecast Monitor] for the number of objects in your Database.
306+
This will trigger if the number of object stored is different from what we predicted.
307+
308+
We recommend the following query:
309+
`avg:mongodb.stats.objects{cluster:eks} by {db}`
310+
311+
Set the condition to 24 hours and click on Advanced Options, you can select the Seasonal algorithm, if you are expecting seasonality behavior in the creation of objects.
312+
313+
Specify the message of your choice and create the monitor.
314+
223315
* Monitoring the cache
224316

317+
Create an https://app.datadoghq.com/monitors#create/apm[APM monitor]. Select the demo environment and the service redis-cache.
318+
You can select the Anomaly alert, and specify the threshold. The message should be pre-filled.
319+
320+
image::redis-apm-monitor.png[]
321+
225322
* Monitoring the Webserver
226323

227-
* Monitoring the app (with traces and logs)
324+
Create an https://app.datadoghq.com/monitors#create/integration[Integration Monitor] for NGINX.
325+
Specify the following query:
326+
`sum:nginx.net.request_per_s{eks} by {host}`
327+
328+
Set the thresholds of your liking and write down the message you want to receive should this monitor trigger.
329+
A good example here is:
330+
```
331+
Number of requests received on the NGINX webserver on host {{host.name}} is over {{threshold}}.
332+
Please ssh in {{host.ip}} @[email protected]
333+
```
334+
335+
* Monitoring the app (with traces or logs)
336+
337+
Finally, you can set up a Log Monitor to monitor your Application.
338+
Create a https://app.datadoghq.com/monitors#create/log[Log Monitor], and specify the following query:
339+
340+
`service:(fetchapp) @http.url_details.path:("/api/flushcache" )`
341+
342+
We recommend setting a threshold at 450 requests.
228343

229-
== Going Further
344+
Then specify your message and save it!
230345

231-
Create DCA ?
346+
=== AB testing
232347

348+
Now, let's run the infinite demo.
349+
350+
image::infinite-demo.png[]
351+
352+
Go on your webapp and click on the infinite demo, this will generate traffic, logs and traces as well.
353+
354+
image::full-trace.png[]
355+
356+
As you let this run, feel free to go create dashboards and navigate throughout the Datadog application.
357+
Soon enough, a few of your monitors should trigger!
358+
Keep an eye on their health in the https://app.datadoghq.com/monitors/manage[Manage Monitors] page.
359+
360+
If you specified an email you will receive a notification as well.
361+
362+
Should you want to go further with the notifications, Datadog integrates with a log of 3rd party tools, such as PagerDuty, Slack, Zendesk...
363+
Check the whole list here: https://docs.datadoghq.com/integrations/#cat-notification
364+
365+
We recommend letting the agents up, as the next steps of the workshop will also have a monitoring section.
233366

234-
At this point,
235367
=== Cleanup
236368

237369
Remove all the installed components:

resources/images/container-map.png

285 KB
Loading

resources/images/container-view.png

241 KB
Loading

resources/images/full-trace.png

173 KB
Loading
147 KB
Loading

resources/images/hostmap.png

194 KB
Loading

resources/images/infinite-demo.png

38.9 KB
Loading
193 KB
Loading

resources/images/redis-dashboard.png

554 KB
Loading

resources/images/redis-logs.png

1000 KB
Loading

0 commit comments

Comments
 (0)