CharlyF
diff --git a/‎02-path-working-with-clusters/201-cluster-monitoring/readme.adoc
Lines changed: 177 additions & 45 deletions b/‎02-path-working-with-clusters/201-cluster-monitoring/readme.adoc
Lines changed: 177 additions & 45 deletions
diff --git a/‎resources/images/container-map.png
285 KB b/‎resources/images/container-map.png
285 KB
diff --git a/‎resources/images/container-view.png
241 KB b/‎resources/images/container-view.png
241 KB
diff --git a/‎resources/images/full-trace.png
173 KB b/‎resources/images/full-trace.png
173 KB
diff --git a/‎resources/images/go-to-redis-traces.png
147 KB b/‎resources/images/go-to-redis-traces.png
147 KB
diff --git a/‎resources/images/hostmap.png
194 KB b/‎resources/images/hostmap.png
194 KB
diff --git a/‎resources/images/infinite-demo.png
38.9 KB b/‎resources/images/infinite-demo.png
38.9 KB
diff --git a/‎resources/images/redis-apm-monitor.png
193 KB b/‎resources/images/redis-apm-monitor.png
193 KB
diff --git a/‎resources/images/redis-dashboard.png
554 KB b/‎resources/images/redis-dashboard.png
554 KB
diff --git a/‎resources/images/redis-logs.png
1000 KB b/‎resources/images/redis-logs.png
1000 KB
@@ -14,8 +14,6 @@ This chapter will demonstrate how to monitor a Kubernetes cluster using the foll
 Datadog is a monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.
 It gives a unified view of an entire stack, allowing to seamlessly monitor metrics, application traces as well as logs.
 
-
-
 == Prerequisites
 
 In order to perform exercises in this chapter, you'll need to deploy configurations to an EKS cluster.  To create an EKS cluster, use the link:../../01-path-basics/102-your-first-cluster#create-a-kubernetes-cluster-with-eks[AWS CLI] (recommended), or alternatively, link:../../01-path-basics/102-your-first-cluster#alternative-create-a-kubernetes-cluster-with-kops[kops].
@@ -31,12 +29,13 @@ From cloud providers like AWS, Google Cloud or Azure to tools like chef, puppet
 Databases like Postegres, Mysql. Webservers like NGINX, HAProxy and so on and so forth.
 
 Today, we will be using:
-- Kubernetes
-- Docker
-- NGINX
-- MongoDB
-- Redis
-- Python
+
+* Kubernetes
+* Docker
+* NGINX
+* MongoDB
+* Redis
+* Python
 
 There are multiple ways to collect data - The first one is via our https://github.com/DataDog/datadog-agent[agent].
 We will deploy the agent on all the nodes of our EKS cluster. It will run as a pod, along side our application.
@@ -90,9 +89,11 @@ Insert a Datadog API Key that can be found in your https://app.datadoghq.com/acc
 
 Then from the current directory, just run:
 
-  $ kubectl apply -f templates/datadog/agent.yaml
-
-TODO add output
+```
+$ kubectl apply -f templates/datadog/agent.yaml
+daemonset.extensions "dd-agent" created
+service "dd-agent" created
+```
 
 As this manifest is a DaemonSet, this will deploy an agent on all your nodes. The agent will live inside a pod.
 
@@ -103,15 +104,18 @@ To set up the MongoDB replica set, you need three things: A StorageClass, a Head
 We will start by creating a StorageClass to tell Kubernetes what kind of storage to use for the database nodes.
 In this case, we will rely on EBS GP2s to store our data.
 
-  $ kubectl apply -f templates/mongodb/storageclass.yaml
-
-TODO add output
+```
+$ kubectl apply -f templates/mongodb/storageclass.yaml
+storageclass.storage.k8s.io "fast" created
+```
 
 Once the storage is ready, we can spin up our MongoDB with 3 replicas.
 
-  $ kubectl apply -f templates/mongodb/mongo.yaml
-
-TODO add output
+```
+$ kubectl apply -f templates/mongodb/mongo.yaml
+service "mongo" created
+statefulset.apps "mongo" created
+```
 
 Note that this will create a service which will operate as a headless loadbalancer in front of the DBs.
 This will also generate Persistent Volume Claims, these should appear as EBS volumes in your AWS account.
@@ -122,22 +126,37 @@ You can run the following command:
 
   $ kubectl exec -it mongo-1 -- sh -c 'mongo admin --host localhost --eval "db.createUser({ user: \"datadog\", pwd: \"tndPhL3wrMEDuj4wLEHmbxbV\", roles: [ {role: \"read\", db: \"admin\"}, {role: \"clusterMonitor\", db:\"admin\"},{role: \"read\", db: \"local\" } ] });"'
 
-=== The cache
+Double check that the persistent volumes were appropriately affected:
 
-We will be leveraging Redis to cache data.
-TODO more details about Redis
+```
+$ kubectl get pvc
+NAME                               STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
+mongo-persistent-storage-mongo-0   Bound     pvc-ec5ccee5-8307-11e8-b84c-06bfcd83c358   1Gi        RWO            fast           3m
+mongo-persistent-storage-mongo-1   Bound     pvc-f3dd1eae-8307-11e8-b84c-06bfcd83c358   1Gi        RWO            fast           3m
+mongo-persistent-storage-mongo-2   Bound     pvc-fffcea2a-8307-11e8-b84c-06bfcd83c358   1Gi        RWO            fast           3m
+```
 
-You can run
+=== The cache
 
- $ kubectl apply -f templates/redis/redis.yaml
+We will be leveraging Redis to cache data.
 
-Which will create a redis pod and a headless service in front of it
+Create your Redis cache:
+```
+$ kubectl apply -f templates/redis/redis.yaml
+deployment.apps "redis" created
+service "redis" created
+```
+Which will create a redis pod and a headless service in front of it.
 
 === Deploy the application
 
 Now is time to deploy your application.
 
- $ kubectl apply -f templates/webapp/webapp.yaml
+```
+$ kubectl apply -f templates/webapp/webapp.yaml
+deployment.apps "fan" created
+service "fan" created
+```
 
 This will create a pod running the application as well as a service in front of it.
 
@@ -150,15 +169,18 @@ Now is time to see the result of your labor.
 Spin up the nginx manifest, this will create a webserver that will front the application as well as a service.
 The service, as opposed to the above services is configured to be a LoadBalancer. Therefore, it will spin up an ELB and will make a public DNS that will be exposed to the world.
 
- $ kubectl apply -f templates/nginx/nginx.yaml
-
+```
+$ kubectl apply -f templates/nginx/nginx.yaml
+daemonset.extensions "nginx" created
+service "nginx-deployment" created
+configmap "nginxconfig" created
+```
 This will also create a ConfigMap used to store the nginx config as an ETCD object instead of a physical file. The benefit is that the file does not have to be present on each node.
 
 Now, take a look at your LoadBalancer being configured:
 
- $ kubectl describe svc nginx-deployment
-
 ```
+$ kubectl describe svc nginx-deployment
 Name:                     nginx-deployment
 Namespace:                default
 Labels:                   <none>
@@ -187,51 +209,161 @@ image::webapp.png[]
 
 == Monitoring
 
+=== Diving in the data
+
+In order to start monitoring, start by visualizing the data at a high level. The Datadog hostmap gives a birds-eye view of your infrastructure.
+Go on the https://app.datadoghq.com/infrastructure/map[hostmap] to see your EKS cluster.
+
+image::hostmap.png[]
+
+As we are using Kubernetes, our infrastrcture is containers driven - Therefore, the containers map will give us more details on the containers running on each host.
+
+You can easily switch back and forth with the toggle on the top left hand corner.
+
+image::container-map.png[]
+
+While having a cluster wide overview at the container level, it is even better to visualize the activity on a per container/pod basis.
+You can achieve this by going to the https://app.datadoghq.com/containers[Container Live view]
+
+image::container-view.png[]
+
+Go to the https://app.datadoghq.com/process[Processes page] to you visualize the processes running on the monitored host.
+
 === Metrics
 
-Open the host map, go to the container map
-You can open the container live view
+The agent is collecting the metrics from containers via the https://docs.datadoghq.com/videos/autodiscovery/[Autodiscovery process].
+It works with Annotations in this case. You can see in the mongo, redis or nginx manifests this template:
+```
+    metadata:
+      annotations:
+        ad.datadoghq.com/redis.check_names: '["redisdb"]'
+        ad.datadoghq.com/redis.init_configs: '[{}]'
+        ad.datadoghq.com/redis.instances: '[{"host": "%%host%%","port":"6379"}]'
+```
+
+Exec in one of the agents and run the status command to see what are the checks run:
+
+ $ kubectl get pods -l app=dd-agent
 
-Then, go the the redis dashboard and mongo db ?
+Pick one of the pods and run
 
-The agent is collecting the metrics from these via the Autodiscovery process.
-It works with Annotations in this case.
+ $ kubectl exec -ti <pod_name> agent status
 
+You should see the mongo check being run, as well as other checks (depending on the pods running on the node).
 
-=== Logs
+=== From Metrics to Logs
 
 Let's stress the cache of our app and see the logs.
 
-Go on to the redis metric that surges, click to see the related logs.
-We can also see logs about mongo, redis, the app.
+Open your Web app and click on the Caching demo, run it and go to your Datadog app.
 
-=== Traces
+This demo will stress redis by querying elements in the cache. It will subsequently submit logs and traces.
 
-Now, let's run the infinite demo.
-from the logs, let's look at the traces.
+Go to the https://app.datadoghq.com/screen/integration/15/redis---overview[Redis Dashboard] - It was made out of the box for you as an agent autodiscovered the Redis pod.
+You will see a surge in the command per seconds, click on the metric and View Related Logs
 
-At this point, you can stop the infinite demo.
+image::redis-dashboard.png[]
 
-We recommend letting the agents up, as the next steps of the workshop will also have a monitoring section.
+This will take you to the https://app.datadoghq.com/logs[Log Explorer] page, carrying the context of the source (here redis) and the time window.
+
+image::redis-logs.png[]
+
+If you click on one of the logs, you will be able to see the details of this log.
+
+=== From Logs to Traces
+
+Now that we have identified the logs that were submitted at the moment of the surge in the number of commands per second, we can look at the relevant traces that our application submitted.
+
+Click on one of the redis logs, and on `Service: Redis` click on See in APM:
+
+image::go-to-redis-traces.png[]
+
+From there you can navigate to the traces that correspond to this service. Clicking on the GET resource we can see the Total number of requests, the errors as well as the latency.
+Now, we can also click on a single trace and see the actual flame graph:
+
+image::redis-traces.png[]
 
 === Setting up some monitors
 
+Before doing some further testing, let's create a few monitors. Go to the https://app.datadoghq.com/monitors#/create[Monitor section] of your Datadog Application.
+
 * Monitoring the Infrastructure
 
+Create a https://app.datadoghq.com/monitors#create/metric[metric monitor] for the memory used by pod - you can pick the metric and set the scope, We recommend using the following query:
+
+`avg:kubernetes.memory.usage{cluster:eks} by {pod_name}`
+
+Set a threshold at 160M
+
+In the `Say what's happening` section, you can describe the issue and use template variables to give more context:
+```
+Memory over {{threshold}} for {{pod_name.name}}.
+```
+
 * Monitoring the DB
 
+Create a https://app.datadoghq.com/monitors#create/forecast[Forecast Monitor] for the number of objects in your Database.
+This will trigger if the number of object stored is different from what we predicted.
+
+We recommend the following query:
+`avg:mongodb.stats.objects{cluster:eks} by {db}`
+
+Set the condition to 24 hours and click on Advanced Options, you can select the Seasonal algorithm, if you are expecting seasonality behavior in the creation of objects.
+
+Specify the message of your choice and create the monitor.
+
 * Monitoring the cache
 
+Create an https://app.datadoghq.com/monitors#create/apm[APM monitor]. Select the demo environment and the service redis-cache.
+You can select the Anomaly alert, and specify the threshold. The message should be pre-filled.
+
+image::redis-apm-monitor.png[]
+
 * Monitoring the Webserver
 
-* Monitoring the app (with traces and logs)
+Create an https://app.datadoghq.com/monitors#create/integration[Integration Monitor] for NGINX.
+Specify the following query:
+`sum:nginx.net.request_per_s{eks} by {host}`
+
+Set the thresholds of your liking and write down the message you want to receive should this monitor trigger.
+A good example here is:
+```
+Number of requests received on the NGINX webserver on host {{host.name}} is over {{threshold}}.
+Please ssh in  {{host.ip}} @[email protected]
+```
+
+* Monitoring the app (with traces or logs)
+
+Finally, you can set up a Log Monitor to monitor your Application.
+Create a https://app.datadoghq.com/monitors#create/log[Log Monitor], and specify the following query:
+
+`service:(fetchapp) @http.url_details.path:("/api/flushcache" )`
+
+We recommend setting a threshold at 450 requests.
 
-== Going Further
+Then specify your message and save it!
 
-Create DCA ?
+=== AB testing
 
+Now, let's run the infinite demo.
+
+image::infinite-demo.png[]
+
+Go on your webapp and click on the infinite demo, this will generate traffic, logs and traces as well.
+
+image::full-trace.png[]
+
+As you let this run, feel free to go create dashboards and navigate throughout the Datadog application.
+Soon enough, a few of your monitors should trigger!
+Keep an eye on their health in the https://app.datadoghq.com/monitors/manage[Manage Monitors] page.
+
+If you specified an email you will receive a notification as well.
+
+Should you want to go further with the notifications, Datadog integrates with a log of 3rd party tools, such as PagerDuty, Slack, Zendesk...
+Check the whole list here: https://docs.datadoghq.com/integrations/#cat-notification
+
+We recommend letting the agents up, as the next steps of the workshop will also have a monitoring section.
 
-At this point,
 === Cleanup
 
 Remove all the installed components: