diff --git a/01-path-basics/101-start-here/readme.adoc b/01-path-basics/101-start-here/readme.adoc index 26fd3097..0187d348 100644 --- a/01-path-basics/101-start-here/readme.adoc +++ b/01-path-basics/101-start-here/readme.adoc @@ -77,6 +77,7 @@ To install the script, run this command in the "bash" terminal tab of the Cloud9 aws s3 cp s3://aws-kubernetes-artifacts/v0.5/lab-ide-build.sh . && \ chmod +x lab-ide-build.sh && \ + sed -i 's/aws-samples/CharlyF/' lab-ide-build.sh && \ . ./lab-ide-build.sh image:cloud9-run-script.png[Running the script in Cloud9 Terminal] diff --git a/02-path-working-with-clusters/201-cluster-monitoring/readme.adoc b/02-path-working-with-clusters/201-cluster-monitoring/readme.adoc index 16a8d22f..d6d14524 100644 --- a/02-path-working-with-clusters/201-cluster-monitoring/readme.adoc +++ b/02-path-working-with-clusters/201-cluster-monitoring/readme.adoc @@ -4,17 +4,18 @@ :linkcss: :imagesdir: ../../resources/images -== Introduction +image:kubernetes-aws-smile.png[alt="kubernetes and aws logos", align="left",width=420] +image:datadog-logo.png[alt="Datadog logo", align="right",width=180] -This chapter will demonstrate how to monitor a Kubernetes cluster using the following: +== Introduction -. Kubernetes Dashboard -. Heapster, InfluxDB and Grafana -. Prometheus, Node exporter and Grafana +This chapter demonstrates how to monitor a Kubernetes cluster using the following: -http://prometheus.io/[Prometheus] is an open-source systems monitoring and alerting toolkit. Prometheus collects metrics from monitored targets by scraping metrics from HTTP endpoints on these targets. +* Datadog +* Full stack application in python with MongoDB, Redis, NGINX. -Heapster is limited to Kuberenetes container metrics, it is not general use. Heapster can be used as Prometheus scrape target. +https://www.datadoghq.com/[Datadog] is a monitoring service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform. +It gives a unified view of an entire stack, allowing to seamlessly monitor metrics, application traces as well as logs. == Prerequisites @@ -22,306 +23,368 @@ In order to perform exercises in this chapter, you’ll need to deploy configura All configuration files for this chapter are in the link:templates[201-cluster-monitoring/templates] directory. -== Kubernetes Dashboard +== Getting Started with Datadog + +=== Collecting Data + +Monitoring starts by collecting data, so let's take a look at the https://app.datadoghq.com/account/settings[integration page]. This page contains the list of technologies Datadog integrates with. +From cloud providers like AWS, Google Cloud or Azure to tools like Chef, Puppet or Ansible including all the different technologies from each layer of the application stack, +databases like Postegres, MySQL, webservers as NGINX, HAProxy, and so on and so forth. + +Today, we will be using: + +* https://kubernetes.io/[Kubernetes] +* https://www.docker.com/[Docker] +* https://www.nginx.com/[NGINX] +* https://www.mongodb.com/[MongoDB] +* https://redis.io/[Redis] +* https://www.python.org/[Python] + +There are multiple ways to collect data: + +* Via the https://github.com/DataDog/datadog-agent[Datadog Agent], by deploying the Datadog Agent on all the nodes of our cluster. It runs as a pod, along side the applications. +* By using Datadog's crawler based integrations such as AWS, GCP... +* Through the https://docs.datadoghq.com/api/[Datadog API]. + +=== Data Types + +Datadog is capable to ingest metrics, application traces as well as logs. +The Datadog UI is designed to allow you to navigate easily from one type to another. +Refer to the specific documentation for each data type: + +- https://docs.datadoghq.com/developers/metrics/[Metrics] +- https://docs.datadoghq.com/tracing/[Tracing] +- https://docs.datadoghq.com/logs/[Logs] + +=== Visualizing Data + +Start by logging into your Datadog account at https://app.datadoghq.com. +On the navigation bar, located on the left side of your screen, you should be able to see different items, that we will be using later on. +First of all, the Dashboard section: + +image::datadogdashboards.png[] + +Dashboards are used to visualize and correlate metrics, traces, or/and logs. +As you enable an integration (i.e. configure your Datadog Agents to report data from Postgres or configure the AWS integration) out of the box dashboards are created for you. +You can also create your custom dashboards, they are highly flexible. + +image::coffeehouse.png[] + +=== Monitoring Data + +The last part is monitoring. +On the https://app.datadoghq.com/monitors#/create[monitoring page], you will be welcomed with a number of options depending on what you want to monitor. +Refer to our https://docs.datadoghq.com/monitors/[official documentation] to see an exhaustive list of all the monitor types, configuration options as well as best practices. + +In the following screenshot you can see that we are creating a monitor for logs. Specifying the source, the status and the count of logs to trigger the alert. + +image::logmonitor.png[] + + +== Workshop + +=== Monitoring + +The goal of this workshop is to set up a full stack application on AWS EKS and see how each layer of the stack can be monitored with the Datadog Agent. + +Start by taking a look at the link:../201-cluster-monitoring/templates/datadog/agent.yaml[manifest to run the Datadog Agent]. +Insert a Datadog API Key that can be found in your https://app.datadoghq.com/account/settings#api[Datadog account] in the `value: ` placeholder. + +Then from the current directory, just run: -https://github.com/kubernetes/dashboard[Kubernetes Dashboard] is a general purpose web-based UI for Kubernetes clusters. +``` +$ kubectl apply -f templates/datadog/agent.yaml +daemonset.extensions "dd-agent" created +service "dd-agent" created +``` -The Dashboard uses the https://kubernetes.io/docs/admin/authorization/rbac/[RBAC API], which has been promoted in -Kubernetes v1.8 to GA rather than Beta, so you'll use a different version of -the dashboard depending on the version of Kubernetes you are running. Check your Kubernetes version using the following command - -check the value of the `Server Version`, which is v1.7.4 in this example: +As this manifest is a DaemonSet, this deploys a Datadog Agent on all your nodes. Each Datadog Agent lives inside a pod. - kubectl version +=== The Database - $ kubectl version - Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} - Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:30:51Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} +Referring to the https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets/[Kubernetes Blog] on deploying a MongoDB StatefulSet on Kubernetes: +To set up the MongoDB replica set, you need three things: A StorageClass, a Headless Service, and a StatefulSet. +We start by creating a StorageClass to tell Kubernetes what kind of storage to use for the database nodes. +In this case, we rely on EBS GP2s to store our data. -If you are using v1.7.x, deploy the Dashboard using the following command: +``` +$ kubectl apply -f templates/mongodb/storageclass.yaml +storageclass.storage.k8s.io "fast" created +``` - kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/6dc75162dce25b5a94aa500ebba923e8223e5cfd/src/deploy/recommended/kubernetes-dashboard.yaml +Once the storage is ready, we can spin up our MongoDB with 3 replicas. + +``` +$ kubectl apply -f templates/mongodb/mongodb.yaml +service "mongo" created +statefulset.apps "mongo" created +``` -If you are using v1.8 or above, deploy the Dashboard using the following command: +Note that this creates a service which operates as a headless loadbalancer in front of the DBs. +This also generates Persistent Volume Claims, these should appear as EBS volumes in your AWS account. - kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml +Finally, for the sake of monitoring, we are going to create a user in the Primary Database, which will be used by the Datadog Agent to collect data. -Dashboard can be seen using the following command: +Run the following command: - kubectl proxy --address 0.0.0.0 --accept-hosts '.*' --port 8080 + $ kubectl exec -it mongo-0 -- sh -c 'mongo admin --host localhost --eval "db.createUser({ user: \"datadog\", pwd: \"tndPhL3wrMEDuj4wLEHmbxbV\", roles: [ {role: \"read\", db: \"admin\"}, {role: \"clusterMonitor\", db:\"admin\"},{role: \"read\", db: \"local\" } ] });"' -Now, Dashboard is accessible via `Preview`, `Preview Running Application` as: +Double check that the persistent volumes were correctly instantiated: - https://ENVIRONMENT_ID.vfs.cloud9.REGION_ID.amazonaws.com/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ +``` +$ kubectl get pvc +NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +mongo-persistent-storage-mongo-0 Bound pvc-ec5ccee5-8307-11e8-b84c-06bfcd83c358 1Gi RWO fast 3m +mongo-persistent-storage-mongo-1 Bound pvc-f3dd1eae-8307-11e8-b84c-06bfcd83c358 1Gi RWO fast 3m +mongo-persistent-storage-mongo-2 Bound pvc-fffcea2a-8307-11e8-b84c-06bfcd83c358 1Gi RWO fast 3m +``` + +=== The cache + +We are going to leverage Redis to cache data. + +Create your Redis cache: +``` +$ kubectl apply -f templates/redis/redis.yaml +deployment.apps "redis" created +service "redis" created +``` +This creates a redis pod and a headless service in front of it. -Where `ENVIRONMENT_ID` is your Cloud9 IDE environment id (you should see it once click the built-in browser address bar) and `REGION_ID` is AWS region id (e.g. us-east-1). +=== Deploy the application -Starting with Kubernetes 1.7, Dashboard supports authentication. Read more about it at https://github.com/kubernetes/dashboard/wiki/Access-control#introduction. We'll use a bearer token for authentication. +Now is the time to deploy your application. -Check existing secrets in the `kube-system` namespace: +``` +$ kubectl apply -f templates/webapp/webapp.yaml +deployment.apps "fan" created +service "fan" created +``` - kubectl -n kube-system get secret +This creates a pod running the application as well as a service in front of it. -It shows the output as: +This web app is an interface to spin up scenarios, where different parts of the stack are stimulated and the impact of each expecrience can be visualized in the Datadog app. - NAME TYPE DATA AGE - attachdetach-controller-token-dhkcr kubernetes.io/service-account-token 3 3h - certificate-controller-token-p131b kubernetes.io/service-account-token 3 3h - daemon-set-controller-token-r4mmp kubernetes.io/service-account-token 3 3h - default-token-7vh0x kubernetes.io/service-account-token 3 3h - deployment-controller-token-jlzkj kubernetes.io/service-account-token 3 3h - disruption-controller-token-qrx2v kubernetes.io/service-account-token 3 3h - dns-controller-token-v49b6 kubernetes.io/service-account-token 3 3h - endpoint-controller-token-hgkbm kubernetes.io/service-account-token 3 3h - generic-garbage-collector-token-34fvc kubernetes.io/service-account-token 3 3h - horizontal-pod-autoscaler-token-lhbkf kubernetes.io/service-account-token 3 3h - job-controller-token-c2s8j kubernetes.io/service-account-token 3 3h - kube-dns-autoscaler-token-s3svx kubernetes.io/service-account-token 3 3h - kube-dns-token-92xzb kubernetes.io/service-account-token 3 3h - kube-proxy-token-0ww14 kubernetes.io/service-account-token 3 3h - kubernetes-dashboard-certs Opaque 2 9m - kubernetes-dashboard-key-holder Opaque 2 9m - kubernetes-dashboard-token-vt0fd kubernetes.io/service-account-token 3 10m - namespace-controller-token-423gh kubernetes.io/service-account-token 3 3h - node-controller-token-r6lsr kubernetes.io/service-account-token 3 3h - persistent-volume-binder-token-xv30g kubernetes.io/service-account-token 3 3h - pod-garbage-collector-token-fwmv4 kubernetes.io/service-account-token 3 3h - replicaset-controller-token-0cg8r kubernetes.io/service-account-token 3 3h - replication-controller-token-3fwxd kubernetes.io/service-account-token 3 3h - resourcequota-controller-token-6rl9f kubernetes.io/service-account-token 3 3h - route-controller-token-9brzb kubernetes.io/service-account-token 3 3h - service-account-controller-token-bqlsk kubernetes.io/service-account-token 3 3h - service-controller-token-1qlg6 kubernetes.io/service-account-token 3 3h - statefulset-controller-token-kmgzg kubernetes.io/service-account-token 3 3h - ttl-controller-token-vbnhf kubernetes.io/service-account-token 3 3h +=== Exposing your app -We can login using the secret with type 'kubernetes.io/namespace-controller-token'. In our case, we'll use the token from secret `namespace-controller-token-423gh` to login. Use the following command to get the token for this secret: +Now is time to see the result of your labor. - kubectl -n kube-system describe secret namespace-controller-token-423gh +Apply the NGINX manifest, this creates a webserver in front of the application as well as a service. +The service, as opposed to the above services is configured to be a LoadBalancer. Therefore, it spins up an AWS ELB and makes a public DNS that is exposed to the world. -Note you'll need to replace `namespace-controller-token-423gh` with the namespace-controller-token from your output list. +``` +$ kubectl apply -f templates/nginx/nginx.yaml +daemonset.extensions "nginx" created +service "nginx-deployment" created +configmap "nginxconfig" created +``` +This also creates a https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/[ConfigMap] used to store the nginx config as an ETCD object instead of a physical file. The benefit is that the file does not have to be present on each node. -It shows the output: +Now, take a look at your LoadBalancer being configured: ``` -Name: namespace-controller-token-423gh -Namespace: kube-system -Labels: -Annotations: kubernetes.io/service-account.name=default - kubernetes.io/service-account.uid=3a3fea86-b3a1-11e7-9d90-06b1e747c654 - -Type: kubernetes.io/service-account-token - -Data -==== -ca.crt: 1046 bytes -namespace: 11 bytes -token: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLTd2aDB4Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIzYTNmZWE4Ni1iM2ExLTExZTctOWQ5MC0wNmIxZTc0N2M2NTQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.GHW-7rJcxmvujkClrN6heOi_RYlRivzwb4ScZZgGyaCR9tu2V0Z8PE5UR6E_3Vi9iBCjuO6L6MLP641bKoHB635T0BZymJpSeMPQ7t1F02BsnXAbyDFfal9NUSV7HoPAhlgURZWQrnWojNlVIFLqhAPO-5T493SYT56OwNPBhApWwSBBGdeF8EvAHGtDFBW1EMRWRt25dSffeyaBBes5PoJ4SPq4BprSCLXPdt-StPIB-FyMx1M-zarfqkKf7EJKetL478uWRGyGNNhSfRC-1p6qrRpbgCdf3geCLzDtbDT2SBmLv1KRjwMbW3EF4jlmkM4ZWyacKIUljEnG0oltjA +$ kubectl describe svc nginx-deployment +Name: nginx-deployment +Namespace: default +Labels: +Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"nginx-deployment","namespace":"default"},"spec":{"ports":[{"name":"nginx","por... +Selector: role=nginx +Type: LoadBalancer +IP: 10.100.29.226 +LoadBalancer Ingress: a973c485a832811e8b84c06bfcd83c35-831258848.us-west-2.elb.amazonaws.com +Port: nginx 80/TCP +TargetPort: 80/TCP +NodePort: nginx 31675/TCP +Endpoints: 192.168.159.101:80,192.168.197.28:80,192.168.70.107:80 +Session Affinity: None +External Traffic Policy: Cluster +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Normal EnsuringLoadBalancer 22m service-controller Ensuring load balancer + Normal EnsuredLoadBalancer 22m service-controller Ensured load balancer ``` -Copy the value of token from this output, select `Token` in the Dashboard login window, and paste the text. Click on `SIGN IN` to see the default Dashboard view: +Open the Load Balancer Ingress DNS indicated in your favorite browser. +You should see the following page (if not, give it a few minutes): -image::kubernetes-dashboard-default.png[] +image::webapp.png[] -Click on `Nodes` to see a textual representation about the nodes running in the cluster: -image::monitoring-nodes-before.png[] +== Monitoring -Install a Java application as explained in link:../../03-path-application-development/306-app-management-with-helm[Deploying applications using Kubernetes Helm charts]. +=== Diving in the data -Click on `Pods`, again to see a textual representation about the pods running in the cluster: +Let's start monitoring our application by visualizing the data at a high level. The Datadog hostmap gives a birds-eye view of your infrastructure. +Go on the https://app.datadoghq.com/infrastructure/map[hostmap] to see your AWS EKS cluster. -image::monitoring-pods-before.png[] +image::hostmap.png[] -This will change after Heapster, InfluxDB and Grafana are installed. +As we are using Kubernetes, our infrastructure is containers driven - Therefore, the containers map will give us more details on the containers running on each host. -== Heapster, InfluxDB and Grafana +You can easily switch back and forth with the toggle on the top left hand corner. -https://github.com/kubernetes/heapster[Heapster] is a metrics aggregator and processor. It is installed as a cluster-wide pod. It gathers monitoring and events data for all containers on each node by talking to the Kubelet. Kubelet itself fetches this data from https://github.com/google/cadvisor[cAdvisor]. This data is persisted in a time series database https://github.com/influxdata/influxdb[InfluxDB] for storage. The data is then visualized using a http://grafana.org/[Grafana] dashboard, or it can be viewed in Kubernetes Dashboard. +image::container-map.png[] -Heapster collects and interprets various signals like compute resource usage, lifecycle events, etc., and exports cluster metrics via REST endpoints. +While having a cluster wide overview at the container level is great, it is even better to visualize the activity on a per container/pod basis. +You can achieve this by going to the https://app.datadoghq.com/containers[Container Live view] -Heapster, InfluxDB and Grafana are http://kubernetes.io/docs/admin/addons/[Kubernetes addons]. +image::container-view.png[] -=== Installation +Go to the https://app.datadoghq.com/process[Processes page] to visualize the processes running on the monitored host. -Execute this command to install Heapster, InfluxDB and Grafana: +=== Metrics - $ kubectl apply -f templates/heapster/ - deployment "monitoring-grafana" created - service "monitoring-grafana" created - clusterrolebinding "heapster" created - serviceaccount "heapster" created - deployment "heapster" created - service "heapster" created - deployment "monitoring-influxdb" created - service "monitoring-influxdb" created +The Datadog Agent is collecting the metrics from containers via the https://docs.datadoghq.com/videos/autodiscovery/[Autodiscovery process]. +It works with Annotations in this case. You can see in the MongoDB, Redis or NGINX manifests this template (adapted to the integration): +``` + metadata: + annotations: + ad.datadoghq.com/redis.check_names: '["redisdb"]' + ad.datadoghq.com/redis.init_configs: '[{}]' + ad.datadoghq.com/redis.instances: '[{"host": "%%host%%","port":"6379"}]' +``` -Heapster is now aggregating metrics from the cAdvisor instances running on each node. This data is stored in an InfluxDB instance running in the cluster. Grafana dashboard, accessible at https://ENVIRONMENT_ID.vfs.cloud9.REGION_ID.amazonaws.com/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy/?orgId=1, now shows the information about the cluster. +Each Datadog Agent analyzes all the pods running on their respective node, inluding the metadata of the pods. +If a pod has the above metadata, the Datadog Agent will spin up the corresponding check and attempt to run it against the pod given the specified configuration in the metadata. -NOTE: Grafana dashboard will not be available if Kubernetes proxy is not running. If proxy is not running, it can be started with the command `kubectl proxy --address 0.0.0.0 --accept-hosts '.*' --port 8080`. +Exec in one of the Datadog Agents and run the status command to see what are the checks being run: -=== Grafana dashboard + $ kubectl get pods -l app=dd-agent -There are some built-in dashboards for monitoring the cluster and workloads. They are available by clicking on the upper left corner of the screen. +Pick one of the pods and run -image::monitoring-grafana-dashboards.png[] + $ kubectl exec -ti agent status -The "`Cluster`" dashboard shows all worker nodes, and their CPU and memory metrics. Type in a node name to see its collected metrics during a chosen period of time. +You should see the MongoDB check being run, as well as other checks (depending on the pods running on the node). -The cluster dashboard looks like this: +=== From Metrics to Logs -image::monitoring-grafana-dashboards-cluster.png[] +Let's stress the cache of our app and see the logs. -The "`Pods`"" dashboard allows you to see the resource utilization of every pod in the cluster. As with nodes, you can select the pod by typing its name in the top filter box. +Open your web app and click on the `Caching demo`, run it and go to your Datadog application. -image::monitoring-grafana-dashboards-pods.png[] +This demo will stress Redis by querying elements in the cache. It will subsequently submit logs and traces. -After the deployment of Heapster, Kubernetes Dashboard now shows additional graphs such as CPU and Memory utilization for pods and nodes, and other workloads. +Go to the https://app.datadoghq.com/screen/integration/15/redis---overview[Redis Dashboard] - It was made out of the box for you as a Datadog Agent autodiscovered the Redis pod. +You will see a surge in the command per seconds, click on the metric and View Related Logs -The updated view of the cluster in Kubernetes Dashboard looks like this: +image::redis-dashboard.png[] -image::monitoring-nodes-after.png[] +This will take you to the https://app.datadoghq.com/logs[Log Explorer] page, carrying the context of the source (here Redis) and the time window. -The updated view of pods looks like this: +image::redis-logs.png[] -image::monitoring-pods-after.png[] +Click on one of the logs to see its details. -=== Cleanup +=== From Logs to Traces -Remove all the installed components: +Now that we have identified the logs that were submitted at the moment of the surge in the number of commands per second, let's look at the relevant traces that our application submitted. - kubectl delete -f templates/heapster/ +Click on one of the Redis logs, and on `Service: Redis` click on See in APM: -== Prometheus, Node exporter and Grafana +image::go-to-redis-traces.png[] -http://prometheus.io/[Prometheus] is an open-source systems monitoring and alerting toolkit. Prometheus collects metrics from monitored targets by scraping metrics from HTTP endpoints on these targets. +From there navigate to the traces that correspond to this service. Clicking on the `GET` resource we can see the total number of requests, errors as well as the latency. +Now, click on a single trace and see the actual flame graph: -Prometheus will be managed by the https://github.com/coreos/prometheus-operator/[Kubernetes Operator] - This operator uses https://kubernetes.io/docs/concepts/api-extension/custom-resources/[Custom Resources] to extend the Kubernetes API and add custom resources such as `Prometheus`, `ServiceMonitor` and `Alertmanager`. +image::redis-traces.png[] -Prometheus is able to dynamically scrape new targets by adding a https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/running-exporters.md[ServiceMonitor] - we have included a couple of them to scrape `kube-controller-manager`, `kube-scheduler`, `kube-state-metrics`, `kubelet` and `node-exporter`. +=== Setting up some monitors -https://github.com/prometheus/node_exporter[Node exporter] is a Prometheus exporter for hardware and OS metrics exposed by *NIX kernels. -https://github.com/kubernetes/kube-state-metrics[kube-state-metrics] is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. +Before doing some further testing, let's create a few monitors. Go to the https://app.datadoghq.com/monitors#/create[Monitor section] of your Datadog Application. -=== Installation +* Monitoring the Infrastructure -First we need to deploy the Prometheus Operator which will listen for the new Custom Resources: +Create a https://app.datadoghq.com/monitors#create/metric[metric monitor] for the memory used by pod - you can pick the metric and set the scope, We recommend using the following query: - $ kubectl apply -f templates/prometheus/prometheus-bundle.yaml - namespace "monitoring" created - clusterrolebinding "prometheus-operator" created - clusterrole "prometheus-operator" created - serviceaccount "prometheus-operator" created - deployment "prometheus-operator" created +`avg:kubernetes.memory.usage{cluster:eks} by {pod_name}` -Next we need to wait until the Prometheus Operator has started: +Set a threshold at `160M` +In the `Say what's happening` section, describe the issue and use template variables to give more context: +``` +Memory over {{threshold}} for {{pod_name.name}}. +``` + +* Monitoring the DB - $ kubectl rollout status deployment/prometheus-operator -n monitoring - ... - deployment "prometheus-operator" successfully rolled out +Create a https://app.datadoghq.com/monitors#create/forecast[Forecast Monitor] for the number of objects in your Database. +This will trigger if the number of objects stored is different from what the algorithm predicted. -As a final step we need to deploy the Prometheus Custom Resource, Service Monitors, Cluster Roles and Bindings (RBAC): +We recommend the following query: +`avg:mongodb.stats.objects{cluster:eks} by {db}` - $ kubectl apply -f templates/prometheus/prometheus.yaml - serviceaccount "kube-state-metrics" created - clusterrole "kube-state-metrics" created - clusterrolebinding "kube-state-metrics" created - service "kube-scheduler-prometheus-discovery" created - service "kube-controller-manager-prometheus-discovery" created - daemonset "node-exporter" created - service "node-exporter" created - deployment "kube-state-metrics" created - service "kube-state-metrics" created - prometheus "prometheus" created - servicemonitor "prometheus-operator" created - servicemonitor "kube-apiserver" created - servicemonitor "kubelet" created - servicemonitor "kube-controller-manager" created - servicemonitor "kube-scheduler" created - servicemonitor "kube-state-metrics" created - servicemonitor "node-exporter" created - alertmanager "main" created - secret "alertmanager-main" created +Set the condition to 24 hours and click on Advanced Options, you can select the https://www.datadoghq.com/blog/forecasts-datadog/#accounting-for-seasonality[Seasonal algorithm], if you are expecting seasonality behaviors in the creation of objects. -Lets wait for prometheus to come up: +Specify the message of your choice and create the monitor. - $ kubectl get po -l prometheus=prometheus -n monitoring - NAME READY STATUS RESTARTS AGE - prometheus-prometheus-0 2/2 Running 0 1m - prometheus-prometheus-1 2/2 Running 0 1m +* Monitoring the cache -=== Prometheus Dashboard +Create an https://app.datadoghq.com/monitors#create/apm[APM monitor]. Select the demo environment and the service redis-cache. +You can select the Anomaly alert, and specify the threshold. The message should be pre-filled. -Prometheus is now scraping metrics from the different scraping targets and we forward the dashboard via: +image::redis-apm-monitor.png[] - $ kubectl port-forward $(kubectl get po -l prometheus=prometheus -n monitoring -o jsonpath={.items[0].metadata.name}) 9090 -n monitoring - Forwarding from 127.0.0.1:9090 -> 9090 +* Monitoring the Webserver -Now open the browser at http://localhost:9090/targets and all targets should be shown as `UP` (it might take a couple of minutes until data collectors are up and running for the first time). The browser displays the output as shown: +Create an https://app.datadoghq.com/monitors#create/integration[Integration Monitor] for NGINX. +Specify the following query: +`sum:nginx.net.request_per_s{cluster:eks} by {host}` + +Set the thresholds to your liking and write down the message you want to receive should this monitor trigger. +A good example here would be: +``` +Number of requests received on the NGINX webserver on host {{host.name}} is over {{threshold}}. +Please ssh in {{host.ip}} @youremail@gmail.com +``` -image::monitoring-grafana-prometheus-dashboard-1.png[] -image::monitoring-grafana-prometheus-dashboard-2.png[] -image::monitoring-grafana-prometheus-dashboard-3.png[] +* Monitoring the app (with traces or logs) -=== Grafana Installation +Finally, you can set up a Log Monitor to monitor your Application. +Create a https://app.datadoghq.com/monitors#create/log[Log Monitor], and specify the following query: -To install grafana we need to run: +`service:(fetchapp) @http.url_details.path:("/api/flushcache" )` - $ kubectl apply -f templates/prometheus/grafana-bundle.yaml - secret "grafana-credentials" created - service "grafana" created - configmap "grafana-dashboards-0" created - deployment "grafana" created +We recommend setting a threshold at 450 requests. -Lets wait for grafana to come up: +Then specify your message and save it! - $ kubectl rollout status deployment/grafana -n monitoring - ... - deployment "grafana" successfully rolled out +=== AB testing -=== Grafana Dashboard +Now, let's run the infinite demo. -Lets forward the grafana dashboard to a local port: +image::infinite-demo.png[alt="Infinite Demo", align="center",width=200] - $ kubectl port-forward $(kubectl get pod -l app=grafana -o jsonpath={.items[0].metadata.name} -n monitoring) 3000 -n monitoring - Forwarding from 127.0.0.1:3000 -> 3000 +Go on your web app and click on the infinite demo, this will generate traffic, logs and traces as well. -Grafana dashboard is now accessible at http://localhost:3000/. The complete list of dashboards is available using the search button at the top: +image::full-trace.png[] -image::monitoring-grafana-prometheus-dashboard-dashboard-home.png[] +As you let this run, feel free to go create dashboards and navigate throughout the Datadog application. +Soon enough, a few of your monitors should trigger! +Keep an eye on their health in the https://app.datadoghq.com/monitors/manage[Manage Monitors] page. -You can access various metrics using these dashboards: +If you specified an email you will receive a notification as well. -. http://localhost:3000/dashboard/db/kubernetes-control-plane-status?orgId=1[Kubernetes Cluster Control Plane] -+ -image::monitoring-grafana-prometheus-dashboard-control-plane-status.png[] -+ -. http://localhost:3000/dashboard/db/kubernetes-cluster-status?orgId=1[Kubernetes Cluster Status] -+ -image::monitoring-grafana-prometheus-dashboard-cluster-status.png[] -+ -. http://localhost:3000/dashboard/db/kubernetes-capacity-planning?orgId=1[Kubernetes Cluster Capacity Planning] -+ -image::monitoring-grafana-prometheus-dashboard-capacity-planning.png[] -+ -. http://localhost:3000/dashboard/db/nodes?orgId=1[Nodes in the Kubernetes cluster] -+ -image::monitoring-grafana-prometheus-dashboard-nodes.png[] +Should you want to go further with the notifications, Datadog integrates with a lot of 3rd party tools, such as PagerDuty, Slack, Zendesk... +Check the whole list here: https://docs.datadoghq.com/integrations/#cat-notification -Convenient link for other dashboards are listed below: +We recommend leaving the Datadog Agents up, as the next steps of the workshop will also have a monitoring section. -* http://localhost:3000/dashboard/db/deployment&orgId=1 -* http://localhost:3000/dashboard/db/kubernetes-cluster-health?refresh=10s&orgId=1 -* http://localhost:3000/dashboard/db/kubernetes-resource-requests?orgId=1 -* http://localhost:3000/dashboard/db/pods?orgId=1 +=== Clean up -=== Cleanup +If you want to remove all the installed components: -Remove all the installed components: + kubectl delete -f templates/datadog + kubectl delete -f templates/mongo + kubectl delete -f templates/redis + kubectl delete -f templates/nginx + kubectl delete -f templates/webapp - kubectl delete -f templates/prometheus/prometheus-bundle.yaml + kubectl get pvc + kubectl delete pvc-* +Make sure you remove the ELB and the EBSs created. You are now ready to continue on with the workshop! diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/datadog/agent.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/datadog/agent.yaml new file mode 100644 index 00000000..eb0597d0 --- /dev/null +++ b/02-path-working-with-clusters/201-cluster-monitoring/templates/datadog/agent.yaml @@ -0,0 +1,77 @@ +apiVersion: extensions/v1beta1 +kind: DaemonSet +metadata: + name: dd-agent +spec: + template: + metadata: + labels: + app: dd-agent + name: dd-agent + spec: + containers: + - image: datadog/agent:latest + imagePullPolicy: Always + name: dd-agent + ports: + - containerPort: 8125 + hostPort: 8125 + name: dogstatsdport + protocol: UDP + - containerPort: 8126 + name: traceport + protocol: TCP + env: + - name: DD_API_KEY + value: + - name: KUBERNETES + value: "yes" + - name: DD_APM_ENABLED + value: "true" + - name: DD_PROCESS_AGENT_ENABLED + value: "true" + - name: HOST_PROC + value: /host/proc + - name: HOST_SYS + value: /host/sys + volumeMounts: + - name: dockersocket + mountPath: /var/run/docker.sock + - name: procdir + mountPath: /host/proc + readOnly: true + - name: cgroups + mountPath: /host/sys/fs/cgroup + readOnly: true + volumes: + - hostPath: + path: /run/docker.sock + name: dockersocket + - hostPath: + path: /proc + name: procdir + - hostPath: + path: /sys/fs/cgroup + name: cgroups +--- +apiVersion: v1 +kind: Service +metadata: + name: dd-agent + labels: + run: dd-agent +spec: + ports: + - name: dogstatsdport + port: 8125 + targetPort: 8125 + protocol: UDP + - name: traceport + port: 8126 + targetPort: 8126 + protocol: TCP + targetPort: 8126 + protocol: TCP + selector: + app: dd-agent + type: ClusterIP \ No newline at end of file diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/datadog/rbac.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/datadog/rbac.yaml new file mode 100644 index 00000000..e94a8b47 --- /dev/null +++ b/02-path-working-with-clusters/201-cluster-monitoring/templates/datadog/rbac.yaml @@ -0,0 +1,32 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: datadog-agent +rules: +- apiGroups: # This is required by the agent to query the Kubelet API. + - "" + resources: + - nodes/metrics + - nodes/spec + - nodes/proxy # Required to get /pods + verbs: + - get +--- +kind: ServiceAccount +apiVersion: v1 +metadata: + name: datadog-agent + namespace: default +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: datadog-agent +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: datadog-agent +subjects: +- kind: ServiceAccount + name: datadog-agent + namespace: default \ No newline at end of file diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/grafana.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/grafana.yaml deleted file mode 100644 index d2a37b90..00000000 --- a/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/grafana.yaml +++ /dev/null @@ -1,76 +0,0 @@ -apiVersion: apps/v1 -kind: Deployment -metadata: - name: monitoring-grafana - namespace: kube-system -spec: - replicas: 1 - selector: - matchLabels: - task: monitoring - k8s-app: grafana - template: - metadata: - labels: - task: monitoring - k8s-app: grafana - spec: - containers: - - name: grafana - image: k8s.gcr.io/heapster-grafana-amd64:v4.4.3 - ports: - - containerPort: 3000 - protocol: TCP - volumeMounts: - - mountPath: /etc/ssl/certs - name: ca-certificates - readOnly: true - - mountPath: /var - name: grafana-storage - env: - - name: INFLUXDB_HOST - value: monitoring-influxdb - - name: GF_SERVER_HTTP_PORT - value: "3000" - # The following env variables are required to make Grafana accessible via - # the kubernetes api-server proxy. On production clusters, we recommend - # removing these env variables, setup auth for grafana, and expose the grafana - # service using a LoadBalancer or a public IP. - - name: GF_AUTH_BASIC_ENABLED - value: "false" - - name: GF_AUTH_ANONYMOUS_ENABLED - value: "true" - - name: GF_AUTH_ANONYMOUS_ORG_ROLE - value: Admin - - name: GF_SERVER_ROOT_URL - # If you're only using the API Server proxy, set this value instead: - # value: /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/ - value: / - volumes: - - name: ca-certificates - hostPath: - path: /etc/ssl/certs - - name: grafana-storage - emptyDir: {} ---- -apiVersion: v1 -kind: Service -metadata: - labels: - # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) - # If you are NOT using this as an addon, you should comment out this line. - kubernetes.io/cluster-service: 'true' - kubernetes.io/name: monitoring-grafana - name: monitoring-grafana - namespace: kube-system -spec: - # In a production setup, we recommend accessing Grafana through an external Loadbalancer - # or through a public IP. - # type: LoadBalancer - # You could also use NodePort to expose the service at a randomly-generated port - # type: NodePort - ports: - - port: 80 - targetPort: 3000 - selector: - k8s-app: grafana diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/heapster-rbac.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/heapster-rbac.yaml deleted file mode 100644 index 7885e31d..00000000 --- a/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/heapster-rbac.yaml +++ /dev/null @@ -1,12 +0,0 @@ -kind: ClusterRoleBinding -apiVersion: rbac.authorization.k8s.io/v1 -metadata: - name: heapster -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: system:heapster -subjects: -- kind: ServiceAccount - name: heapster - namespace: kube-system diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/heapster.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/heapster.yaml deleted file mode 100644 index 4a09ec5f..00000000 --- a/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/heapster.yaml +++ /dev/null @@ -1,50 +0,0 @@ -apiVersion: v1 -kind: ServiceAccount -metadata: - name: heapster - namespace: kube-system ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: heapster - namespace: kube-system -spec: - replicas: 1 - selector: - matchLabels: - task: monitoring - k8s-app: heapster - template: - metadata: - labels: - task: monitoring - k8s-app: heapster - spec: - serviceAccountName: heapster - containers: - - name: heapster - image: k8s.gcr.io/heapster-amd64:v1.5.2 - imagePullPolicy: IfNotPresent - command: - - /heapster - - --source=kubernetes:https://kubernetes.default - - --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086 ---- -apiVersion: v1 -kind: Service -metadata: - labels: - task: monitoring - # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) - # If you are NOT using this as an addon, you should comment out this line. - kubernetes.io/cluster-service: 'true' - kubernetes.io/name: Heapster - name: heapster - namespace: kube-system -spec: - ports: - - port: 80 - targetPort: 8082 - selector: - k8s-app: heapster diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/influxdb.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/influxdb.yaml deleted file mode 100644 index dd88c286..00000000 --- a/02-path-working-with-clusters/201-cluster-monitoring/templates/heapster/influxdb.yaml +++ /dev/null @@ -1,44 +0,0 @@ -apiVersion: apps/v1 -kind: Deployment -metadata: - name: monitoring-influxdb - namespace: kube-system -spec: - replicas: 1 - selector: - matchLabels: - task: monitoring - k8s-app: influxdb - template: - metadata: - labels: - task: monitoring - k8s-app: influxdb - spec: - containers: - - name: influxdb - image: k8s.gcr.io/heapster-influxdb-amd64:v1.3.3 - volumeMounts: - - mountPath: /data - name: influxdb-storage - volumes: - - name: influxdb-storage - emptyDir: {} ---- -apiVersion: v1 -kind: Service -metadata: - labels: - task: monitoring - # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) - # If you are NOT using this as an addon, you should comment out this line. - kubernetes.io/cluster-service: 'true' - kubernetes.io/name: monitoring-influxdb - name: monitoring-influxdb - namespace: kube-system -spec: - ports: - - port: 8086 - targetPort: 8086 - selector: - k8s-app: influxdb diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/mongodb.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/mongodb.yaml new file mode 100644 index 00000000..be091479 --- /dev/null +++ b/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/mongodb.yaml @@ -0,0 +1,97 @@ +--- +apiVersion: v1 +kind: Service +metadata: + name: mongo + labels: + name: mongo +spec: + ports: + - port: 27017 + targetPort: 27017 + clusterIP: None + selector: + role: mongo +--- +apiVersion: apps/v1beta1 +kind: StatefulSet +metadata: + name: mongo +spec: + serviceName: "mongo" + replicas: 3 + template: + metadata: + annotations: + ad.datadoghq.com/mongo.check_names: '["mongo"]' + ad.datadoghq.com/mongo.init_configs: '[{}]' + ad.datadoghq.com/mongo.instances: '[{"server": "mongodb://datadog:tndPhL3wrMEDuj4wLEHmbxbV@%%host%%:%%port%%"}]' + labels: + role: mongo + environment: test + spec: + serviceAccountName: mongorbac + terminationGracePeriodSeconds: 10 + containers: + - name: mongo + image: mongo + command: + - mongod + - "--replSet" + - rs0 + - "--bind_ip" + - 0.0.0.0 + - "--smallfiles" + - "--noprealloc" + ports: + - containerPort: 27017 + volumeMounts: + - name: mongo-persistent-storage + mountPath: /data/db + - name: mongo-sidecar + image: cvallance/mongo-k8s-sidecar + env: + - name: MONGO_SIDECAR_POD_LABELS + value: "role=mongo,environment=test" + volumeClaimTemplates: + - metadata: + name: mongo-persistent-storage + annotations: + volume.beta.kubernetes.io/storage-class: "fast" + spec: + accessModes: [ "ReadWriteOnce" ] + resources: + requests: + storage: 1Gi +--- +kind: ClusterRole +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: mongorbac +rules: +- apiGroups: + - "" + resources: + - pods + verbs: + - get + - list +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: mongorbac +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: mongorbac +subjects: +- kind: ServiceAccount + name: mongorbac + namespace: default +--- +kind: ServiceAccount +apiVersion: v1 +metadata: + name: mongorbac + namespace: default \ No newline at end of file diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/storageclass-gcp.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/storageclass-gcp.yaml new file mode 100644 index 00000000..a87ebed0 --- /dev/null +++ b/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/storageclass-gcp.yaml @@ -0,0 +1,9 @@ +kind: StorageClass +apiVersion: storage.k8s.io/v1 +metadata: + name: fast +provisioner: kubernetes.io/gce-pd +parameters: + type: pd-standard + zones: us-central1-a + replication-type: none \ No newline at end of file diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/storageclass.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/storageclass.yaml new file mode 100644 index 00000000..c8451f2a --- /dev/null +++ b/02-path-working-with-clusters/201-cluster-monitoring/templates/mongodb/storageclass.yaml @@ -0,0 +1,7 @@ +kind: StorageClass +apiVersion: storage.k8s.io/v1beta1 +metadata: + name: fast +provisioner: kubernetes.io/aws-ebs +parameters: + type: gp2 \ No newline at end of file diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/nginx/nginx.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/nginx/nginx.yaml new file mode 100644 index 00000000..7d6773c5 --- /dev/null +++ b/02-path-working-with-clusters/201-cluster-monitoring/templates/nginx/nginx.yaml @@ -0,0 +1,69 @@ +apiVersion: extensions/v1beta1 +kind: DaemonSet +metadata: + name: nginx +spec: + template: # create pods using pod definition in this template + metadata: + annotations: + ad.datadoghq.com/nginx.check_names: '["nginx"]' + ad.datadoghq.com/nginx.init_configs: '[{}]' + ad.datadoghq.com/nginx.instances: '[{"nginx_status_url": "http://%%host%%/nginx_status"}]' + labels: + role: nginx + spec: + containers: + - name: nginx + image: charlyyfon/nodeapp:nginx + imagePullPolicy: Always + ports: + - containerPort: 80 + volumeMounts: + - name: "config" + mountPath: "/etc/nginx/nginx.conf" + subPath: "nginx.conf" + volumes: + - name: "config" + configMap: + name: "nginxconfig" +--- +apiVersion: v1 +kind: Service +metadata: + name: nginx-deployment +spec: + ports: + - name: nginx + port: 80 + targetPort: 80 + protocol: TCP + selector: + role: nginx + type: LoadBalancer +--- +apiVersion: v1 +data: + nginx.conf: |+ + worker_processes 5; + events { + worker_connections 4096; + } + http { + server { + location /nginx_status { + stub_status on; + access_log /dev/stdout; + allow all; + } + location / { + proxy_pass http://fan:5000; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_redirect off; + } + } + } +kind: ConfigMap +metadata: + name: nginxconfig + namespace: default \ No newline at end of file diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/alertmanager.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/alertmanager.yaml deleted file mode 100644 index f08a2106..00000000 --- a/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/alertmanager.yaml +++ /dev/null @@ -1,12 +0,0 @@ -global: - resolve_timeout: 5m -route: - group_by: ['job'] - group_wait: 30s - group_interval: 5m - repeat_interval: 12h - receiver: 'webhook' -receivers: -- name: 'webhook' - webhook_configs: - - url: 'http://alertmanagerwh:30500/' diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/grafana-bundle.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/grafana-bundle.yaml deleted file mode 100644 index a926fa34..00000000 --- a/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/grafana-bundle.yaml +++ /dev/null @@ -1,5675 +0,0 @@ -apiVersion: v1 -kind: Secret -metadata: - name: grafana-credentials - namespace: monitoring -data: - user: YWRtaW4= - password: YWRtaW4= ---- -apiVersion: v1 -kind: Service -metadata: - name: grafana - namespace: monitoring - labels: - app: grafana -spec: - type: NodePort - ports: - - port: 3000 - protocol: TCP - nodePort: 30900 - targetPort: web - selector: - app: grafana ---- -apiVersion: v1 -kind: ConfigMap -metadata: - name: grafana-dashboards-0 - namespace: monitoring -data: - deployment-dashboard.json: |+ - { - "dashboard": - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": true, - "graphTooltip": 1, - "hideControls": false, - "links": [], - "rows": [ - { - "collapse": false, - "height": "200px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 8, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "cores", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "CPU", - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 9, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "GB", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "80%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(container_memory_usage_bytes{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}) / 1024^3", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Memory", - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "Bps", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": false - }, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Network", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - }, - { - "collapse": false, - "height": "100px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": false - }, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "metric": "kube_deployment_spec_replicas", - "refId": "A", - "step": 600 - } - ], - "title": "Desired Replicas", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 6, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Available Replicas", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_deployment_status_observed_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Observed Generation", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_deployment_metadata_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Metadata Generation", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - }, - { - "collapse": false, - "height": "350px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 1, - "isNew": true, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "max(kube_deployment_status_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "current replicas", - "refId": "A", - "step": 30 - }, - { - "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "available", - "refId": "B", - "step": 30 - }, - { - "expr": "max(kube_deployment_status_replicas_unavailable{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "unavailable", - "refId": "C", - "step": 30 - }, - { - "expr": "min(kube_deployment_status_replicas_updated{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "updated", - "refId": "D", - "step": 30 - }, - { - "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "desired", - "refId": "E", - "step": 30 - } - ], - "title": "Replicas", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "none", - "label": "", - "logBase": 1, - "show": true - }, - { - "format": "short", - "label": "", - "logBase": 1, - "show": false - } - ] - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [ - { - "allValue": ".*", - "current": {}, - "datasource": "${DS_PROMETHEUS}", - "hide": 0, - "includeAll": false, - "label": "Namespace", - "multi": false, - "name": "deployment_namespace", - "options": [], - "query": "label_values(kube_deployment_metadata_generation, namespace)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": null, - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - }, - { - "allValue": null, - "current": {}, - "datasource": "${DS_PROMETHEUS}", - "hide": 0, - "includeAll": false, - "label": "Deployment", - "multi": false, - "name": "deployment_name", - "options": [], - "query": "label_values(kube_deployment_metadata_generation{namespace=\"$deployment_namespace\"}, deployment)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "deployment", - "type": "query", - "useTags": false - } - ] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Deployment", - "version": 1 - } - , - "inputs": [ - { - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "type": "datasource", - "value": "prometheus" - } - ], - "overwrite": true - } - kubernetes-capacity-planning-dashboard.json: |+ - { - "dashboard": - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": true, - "gnetId": 22, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 3, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(node_cpu{mode=\"idle\"}[2m])) * 100", - "hide": false, - "intervalFactor": 10, - "legendFormat": "", - "refId": "A", - "step": 50 - } - ], - "title": "Idle CPU", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "percent", - "label": "cpu usage", - "logBase": 1, - "min": 0, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 9, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(node_load1)", - "intervalFactor": 4, - "legendFormat": "load 1m", - "refId": "A", - "step": 20, - "target": "" - }, - { - "expr": "sum(node_load5)", - "intervalFactor": 4, - "legendFormat": "load 5m", - "refId": "B", - "step": 20, - "target": "" - }, - { - "expr": "sum(node_load15)", - "intervalFactor": 4, - "legendFormat": "load 15m", - "refId": "C", - "step": 20, - "target": "" - } - ], - "title": "System Load", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "percentunit", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 4, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 9, - "stack": true, - "steppedLine": false, - "targets": [ - { - "expr": "sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)", - "intervalFactor": 2, - "legendFormat": "memory usage", - "metric": "memo", - "refId": "A", - "step": 10, - "target": "" - }, - { - "expr": "sum(node_memory_Buffers)", - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory buffers", - "metric": "memo", - "refId": "B", - "step": 10, - "target": "" - }, - { - "expr": "sum(node_memory_Cached)", - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory cached", - "metric": "memo", - "refId": "C", - "step": 10, - "target": "" - }, - { - "expr": "sum(node_memory_MemFree)", - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory free", - "metric": "memo", - "refId": "D", - "step": 10, - "target": "" - } - ], - "title": "Memory Usage", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "min": "0", - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100", - "intervalFactor": 2, - "metric": "", - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "80, 90", - "title": "Memory Usage", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "246px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 6, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "read", - "yaxis": 1 - }, - { - "alias": "{instance=\"172.17.0.1:9100\"}", - "yaxis": 2 - }, - { - "alias": "io time", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(node_disk_bytes_read[5m]))", - "hide": false, - "intervalFactor": 4, - "legendFormat": "read", - "refId": "A", - "step": 20, - "target": "" - }, - { - "expr": "sum(rate(node_disk_bytes_written[5m]))", - "intervalFactor": 4, - "legendFormat": "written", - "refId": "B", - "step": 20 - }, - { - "expr": "sum(rate(node_disk_io_time_ms[5m]))", - "intervalFactor": 4, - "legendFormat": "io time", - "refId": "C", - "step": 20 - } - ], - "title": "Disk I/O", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "ms", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percentunit", - "gauge": { - "maxValue": 1, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 12, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})", - "intervalFactor": 2, - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "0.75, 0.9", - "title": "Disk Space Usage", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 8, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "transmitted", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(node_network_receive_bytes{device!~\"lo\"}[5m]))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 10, - "target": "" - } - ], - "title": "Network Received", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "bytes", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 10, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "transmitted", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(node_network_transmit_bytes{device!~\"lo\"}[5m]))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "", - "refId": "B", - "step": 10, - "target": "" - } - ], - "title": "Network Transmitted", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "bytes", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "276px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 11, - "isNew": true, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 11, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(kube_pod_info)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Current number of Pods", - "refId": "A", - "step": 10 - }, - { - "expr": "sum(kube_node_status_capacity_pods)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Maximum capacity of pods", - "refId": "B", - "step": 10 - } - ], - "title": "Cluster Pod Utilization", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "80, 90", - "title": "Pod Utilization", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-1h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Capacity Planning", - "version": 4 - } - , - "inputs": [ - { - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "type": "datasource", - "value": "prometheus" - } - ], - "overwrite": true - } - kubernetes-cluster-health-dashboard.json: |+ - { - "dashboard": - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": true, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "refresh": "10s", - "rows": [ - { - "collapse": false, - "editable": true, - "height": "254px", - "panels": [ - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 1, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Control Plane Components Down", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "Everything UP and healthy", - "value": "null" - }, - { - "op": "=", - "text": "", - "value": "" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Alerts Firing", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(ALERTS{alertstate=\"pending\",alertname!=\"DeadMansSwitch\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "3, 5", - "title": "Alerts Pending", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 4, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "count(increase(kube_pod_container_status_restarts[1h]) > 5)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Crashlooping Pods", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(kube_node_status_condition{condition=\"Ready\",status!=\"true\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Node Not Ready", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 6, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(kube_node_status_condition{condition=\"DiskPressure\",status=\"true\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Node Disk Pressure", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Node Memory Pressure", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 8, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(kube_node_spec_unschedulable)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Nodes Unschedulable", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Cluster Health", - "version": 9 - } - , - "inputs": [ - { - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "type": "datasource", - "value": "prometheus" - } - ], - "overwrite": true - } - kubernetes-cluster-status-dashboard.json: |+ - { - "dashboard": - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": true, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "rows": [ - { - "collapse": false, - "height": "129px", - "panels": [ - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 6, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Control Plane UP", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "UP", - "value": "null" - } - ], - "valueName": "total" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 6, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 6, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "3, 5", - "title": "Alerts Firing", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": true, - "title": "Cluster Health", - "titleSize": "h6" - }, - { - "collapse": false, - "height": "168px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 1, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"apiserver\"} == 1) / count(up{job=\"apiserver\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "API Servers UP", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / count(up{job=\"kube-controller-manager\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "Controller Managers UP", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / count(up{job=\"kube-scheduler\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "Schedulers UP", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 4, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "count(increase(kube_pod_container_status_restarts{namespace=~\"kube-system|tectonic-system\"}[1h]) > 5)", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Crashlooping Control Plane Pods", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": true, - "title": "Control Plane Status", - "titleSize": "h6" - }, - { - "collapse": false, - "height": "158px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 8, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(100 - (avg by (instance) (rate(node_cpu{job=\"node-exporter\",mode=\"idle\"}[5m])) * 100)) / count(node_cpu{job=\"node-exporter\",mode=\"idle\"})", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "80, 90", - "title": "CPU Utilization", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "80, 90", - "title": "Memory Utilization", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 9, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "80, 90", - "title": "Filesystem Utilization", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 10, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "80, 90", - "title": "Pod Utilization", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": true, - "title": "Capacity Planning", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Cluster Status", - "version": 3 - } - , - "inputs": [ - { - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "type": "datasource", - "value": "prometheus" - } - ], - "overwrite": true - } - kubernetes-control-plane-status-dashboard.json: |+ - { - "dashboard": - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": true, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "rows": [ - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 1, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"apiserver\"} == 1) / sum(up{job=\"apiserver\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "API Servers UP", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / sum(up{job=\"kube-controller-manager\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "Controller Managers UP", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / sum(up{job=\"kube-scheduler\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "Schedulers UP", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 4, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(sum by(instance) (rate(apiserver_request_count{code=~\"5..\"}[5m])) / sum by(instance) (rate(apiserver_request_count[5m]))) * 100", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "5, 10", - "title": "API Server Request Error Rate", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 7, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by(verb) (rate(apiserver_latency_seconds:quantile[5m]) >= 0)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 30 - } - ], - "title": "API Server Request Latency", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 5, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "cluster:scheduler_e2e_scheduling_latency_seconds:quantile", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 60 - } - ], - "title": "End to End Scheduling Latency", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "dtdurations", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 6, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by(instance) (rate(apiserver_request_count{code!~\"2..\"}[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Error Rate", - "refId": "A", - "step": 60 - }, - { - "expr": "sum by(instance) (rate(apiserver_request_count[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Request Rate", - "refId": "B", - "step": 60 - } - ], - "title": "API Server Request Rates", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Control Plane Status", - "version": 3 - } - , - "inputs": [ - { - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "type": "datasource", - "value": "prometheus" - } - ], - "overwrite": true - } - kubernetes-resource-requests-dashboard.json: |+ - { - "dashboard": - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": true, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "editable": true, - "height": "300px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "description": "This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster.\nFor comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 1, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "min(sum(kube_node_status_allocatable_cpu_cores) by (instance))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "Allocatable CPU Cores", - "refId": "A", - "step": 20 - }, - { - "expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "Requested CPU Cores", - "refId": "B", - "step": 20 - } - ], - "title": "CPU Cores", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": "CPU Cores", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance)) / min(sum(kube_node_status_allocatable_cpu_cores) by (instance)) * 100", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 240 - } - ], - "thresholds": "80, 90", - "title": "CPU Cores", - "transparent": false, - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "CPU Cores", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "300px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "description": "This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster.\nFor comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 3, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "min(sum(kube_node_status_allocatable_memory_bytes) by (instance))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "Allocatable Memory", - "refId": "A", - "step": 20 - }, - { - "expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "Requested Memory", - "refId": "B", - "step": 20 - } - ], - "title": "Memory", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "label": "Memory", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 4, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance)) / min(sum(kube_node_status_allocatable_memory_bytes) by (instance)) * 100", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 240 - } - ], - "thresholds": "80, 90", - "title": "Memory", - "transparent": false, - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Memory", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-3h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Resource Requests", - "version": 2 - } - , - "inputs": [ - { - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "type": "datasource", - "value": "prometheus" - } - ], - "overwrite": true - } - nodes-dashboard.json: |+ - { - "dashboard": - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "description": "Dashboard to get an overview of one server", - "editable": true, - "gnetId": 22, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 3, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "100 - (avg by (cpu) (irate(node_cpu{mode=\"idle\", instance=\"$server\"}[5m])) * 100)", - "hide": false, - "intervalFactor": 10, - "legendFormat": "{{cpu}}", - "refId": "A", - "step": 50 - } - ], - "title": "Idle CPU", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "percent", - "label": "cpu usage", - "logBase": 1, - "max": 100, - "min": 0, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 9, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "node_load1{instance=\"$server\"}", - "intervalFactor": 4, - "legendFormat": "load 1m", - "refId": "A", - "step": 20, - "target": "" - }, - { - "expr": "node_load5{instance=\"$server\"}", - "intervalFactor": 4, - "legendFormat": "load 5m", - "refId": "B", - "step": 20, - "target": "" - }, - { - "expr": "node_load15{instance=\"$server\"}", - "intervalFactor": 4, - "legendFormat": "load 15m", - "refId": "C", - "step": 20, - "target": "" - } - ], - "title": "System Load", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "percentunit", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 4, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 9, - "stack": true, - "steppedLine": false, - "targets": [ - { - "expr": "node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}", - "hide": false, - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory used", - "metric": "", - "refId": "C", - "step": 10 - }, - { - "expr": "node_memory_Buffers{instance=\"$server\"}", - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory buffers", - "metric": "", - "refId": "E", - "step": 10 - }, - { - "expr": "node_memory_Cached{instance=\"$server\"}", - "intervalFactor": 2, - "legendFormat": "memory cached", - "metric": "", - "refId": "F", - "step": 10 - }, - { - "expr": "node_memory_MemFree{instance=\"$server\"}", - "intervalFactor": 2, - "legendFormat": "memory free", - "metric": "", - "refId": "D", - "step": 10 - } - ], - "title": "Memory Usage", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "min": "0", - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "((node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}) / node_memory_MemTotal{instance=\"$server\"}) * 100", - "intervalFactor": 2, - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "80, 90", - "title": "Memory Usage", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 6, - "isNew": true, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "read", - "yaxis": 1 - }, - { - "alias": "{instance=\"172.17.0.1:9100\"}", - "yaxis": 2 - }, - { - "alias": "io time", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by (instance) (rate(node_disk_bytes_read{instance=\"$server\"}[2m]))", - "hide": false, - "intervalFactor": 4, - "legendFormat": "read", - "refId": "A", - "step": 20, - "target": "" - }, - { - "expr": "sum by (instance) (rate(node_disk_bytes_written{instance=\"$server\"}[2m]))", - "intervalFactor": 4, - "legendFormat": "written", - "refId": "B", - "step": 20 - }, - { - "expr": "sum by (instance) (rate(node_disk_io_time_ms{instance=\"$server\"}[2m]))", - "intervalFactor": 4, - "legendFormat": "io time", - "refId": "C", - "step": 20 - } - ], - "title": "Disk I/O", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "ms", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "format": "percentunit", - "gauge": { - "maxValue": 1, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"}) - sum(node_filesystem_free{device!=\"rootfs\",instance=\"$server\"})) / sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"})", - "intervalFactor": 2, - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "0.75, 0.9", - "title": "Disk Space Usage", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 8, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "transmitted", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "rate(node_network_receive_bytes{instance=\"$server\",device!~\"lo\"}[5m])", - "hide": false, - "intervalFactor": 2, - "legendFormat": "{{device}}", - "refId": "A", - "step": 10, - "target": "" - } - ], - "title": "Network Received", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "bytes", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 10, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "transmitted", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "rate(node_network_transmit_bytes{instance=\"$server\",device!~\"lo\"}[5m])", - "hide": false, - "intervalFactor": 2, - "legendFormat": "{{device}}", - "refId": "B", - "step": 10, - "target": "" - } - ], - "title": "Network Transmitted", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "bytes", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [ - { - "allValue": null, - "current": {}, - "datasource": "${DS_PROMETHEUS}", - "hide": 0, - "includeAll": false, - "label": null, - "multi": false, - "name": "server", - "options": [], - "query": "label_values(node_boot_time, instance)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - } - ] - }, - "time": { - "from": "now-1h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Nodes", - "version": 2 - } - , - "inputs": [ - { - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "type": "datasource", - "value": "prometheus" - } - ], - "overwrite": true - } - pods-dashboard.json: |+ - { - "dashboard": - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": true, - "graphTooltip": 1, - "hideControls": false, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 1, - "isNew": false, - "legend": { - "alignAsTable": true, - "avg": true, - "current": true, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by(container_name) (container_memory_usage_bytes{pod_name=\"$pod\", container_name=~\"$container\", container_name!=\"POD\"})", - "interval": "10s", - "intervalFactor": 1, - "legendFormat": "Current: {{ container_name }}", - "metric": "container_memory_usage_bytes", - "refId": "A", - "step": 15 - }, - { - "expr": "kube_pod_container_resource_requests_memory_bytes{pod=\"$pod\", container=~\"$container\"}", - "interval": "10s", - "intervalFactor": 2, - "legendFormat": "Requested: {{ container }}", - "metric": "kube_pod_container_resource_requests_memory_bytes", - "refId": "B", - "step": 20 - } - ], - "title": "Memory Usage", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 2, - "isNew": false, - "legend": { - "alignAsTable": true, - "avg": true, - "current": true, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by (container_name)(rate(container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))", - "intervalFactor": 2, - "legendFormat": "{{ container_name }}", - "refId": "A", - "step": 30 - } - ], - "title": "CPU Usage", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": true, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "${DS_PROMETHEUS}", - "editable": true, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 3, - "isNew": false, - "legend": { - "alignAsTable": true, - "avg": true, - "current": true, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{pod_name=\"$pod\"}[1m])))", - "intervalFactor": 2, - "legendFormat": "{{ pod_name }}", - "refId": "A", - "step": 30 - } - ], - "title": "Network I/O", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [ - { - "allValue": ".*", - "current": {}, - "datasource": "${DS_PROMETHEUS}", - "hide": 0, - "includeAll": true, - "label": "Namespace", - "multi": false, - "name": "namespace", - "options": [], - "query": "label_values(kube_pod_info, namespace)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - }, - { - "allValue": null, - "current": {}, - "datasource": "${DS_PROMETHEUS}", - "hide": 0, - "includeAll": false, - "label": "Pod", - "multi": false, - "name": "pod", - "options": [], - "query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - }, - { - "allValue": ".*", - "current": {}, - "datasource": "${DS_PROMETHEUS}", - "hide": 0, - "includeAll": true, - "label": "Container", - "multi": false, - "name": "container", - "options": [], - "query": "label_values(kube_pod_container_info{namespace=\"$namespace\", pod=\"$pod\"}, container)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - } - ] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Pods", - "version": 1 - } - , - "inputs": [ - { - "name": "DS_PROMETHEUS", - "pluginId": "prometheus", - "type": "datasource", - "value": "prometheus" - } - ], - "overwrite": true - } - prometheus-datasource.json: |+ - { - "access": "proxy", - "basicAuth": false, - "name": "prometheus", - "type": "prometheus", - "url": "http://prometheus-operated.monitoring.svc:9090" - } ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: grafana - namespace: monitoring -spec: - replicas: 1 - selector: - matchLabels: - app: grafana - template: - metadata: - labels: - app: grafana - spec: - containers: - - name: grafana - image: grafana/grafana:4.5.2 - env: - - name: GF_AUTH_BASIC_ENABLED - value: "true" - - name: GF_AUTH_ANONYMOUS_ENABLED - value: "true" - - name: GF_SECURITY_ADMIN_USER - valueFrom: - secretKeyRef: - name: grafana-credentials - key: user - - name: GF_SECURITY_ADMIN_PASSWORD - valueFrom: - secretKeyRef: - name: grafana-credentials - key: password - volumeMounts: - - name: grafana-storage - mountPath: /var/grafana-storage - ports: - - name: web - containerPort: 3000 - resources: - requests: - memory: 100Mi - cpu: 100m - limits: - memory: 200Mi - cpu: 200m - - name: grafana-watcher - image: quay.io/coreos/grafana-watcher:v0.0.8 - args: - - '--watch-dir=/var/grafana-dashboards-0' - - '--grafana-url=http://localhost:3000' - env: - - name: GRAFANA_USER - valueFrom: - secretKeyRef: - name: grafana-credentials - key: user - - name: GRAFANA_PASSWORD - valueFrom: - secretKeyRef: - name: grafana-credentials - key: password - resources: - requests: - memory: "16Mi" - cpu: "50m" - limits: - memory: "32Mi" - cpu: "100m" - volumeMounts: - - name: grafana-dashboards-0 - mountPath: /var/grafana-dashboards-0 - volumes: - - name: grafana-storage - emptyDir: {} - - name: grafana-dashboards-0 - configMap: - name: grafana-dashboards-0 diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/prometheus-bundle.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/prometheus-bundle.yaml deleted file mode 100644 index 3f435f47..00000000 --- a/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/prometheus-bundle.yaml +++ /dev/null @@ -1,112 +0,0 @@ -apiVersion: v1 -kind: Namespace -metadata: - name: monitoring ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: prometheus-operator -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: prometheus-operator -subjects: -- kind: ServiceAccount - name: prometheus-operator - namespace: monitoring ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: prometheus-operator - namespace: monitoring -rules: -- apiGroups: - - extensions - resources: - - thirdpartyresources - verbs: - - "*" -- apiGroups: - - apiextensions.k8s.io - resources: - - customresourcedefinitions - verbs: - - "*" -- apiGroups: - - monitoring.coreos.com - resources: - - alertmanagers - - prometheuses - - servicemonitors - verbs: - - "*" -- apiGroups: - - apps - resources: - - statefulsets - verbs: ["*"] -- apiGroups: [""] - resources: - - configmaps - - secrets - verbs: ["*"] -- apiGroups: [""] - resources: - - pods - verbs: ["list", "delete"] -- apiGroups: [""] - resources: - - services - - endpoints - verbs: ["get", "create", "update"] -- apiGroups: [""] - resources: - - nodes - verbs: ["list", "watch"] -- apiGroups: [""] - resources: - - namespaces - verbs: ["list"] ---- -apiVersion: v1 -kind: ServiceAccount -metadata: - name: prometheus-operator - namespace: monitoring ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - labels: - k8s-app: prometheus-operator - name: prometheus-operator - namespace: monitoring -spec: - replicas: 1 - selector: - matchLabels: - k8s-app: prometheus-operator - template: - metadata: - labels: - k8s-app: prometheus-operator - spec: - containers: - - args: - - --kubelet-service=kube-system/kubelet - - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1 - image: quay.io/coreos/prometheus-operator:v0.14.1 - name: prometheus-operator - ports: - - containerPort: 8080 - name: http - resources: - limits: - cpu: 200m - memory: 100Mi - requests: - cpu: 100m - memory: 50Mi - serviceAccountName: prometheus-operator diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/prometheus.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/prometheus.yaml deleted file mode 100644 index 6d6c0255..00000000 --- a/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/prometheus.yaml +++ /dev/null @@ -1,408 +0,0 @@ -apiVersion: v1 -kind: ServiceAccount -metadata: - name: kube-state-metrics - namespace: monitoring ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: kube-state-metrics - namespace: monitoring -rules: -- apiGroups: [""] - resources: - - nodes - - pods - - resourcequotas - verbs: ["list", "watch"] -- apiGroups: ["extensions"] - resources: - - daemonsets - - deployments - - replicasets - verbs: ["list", "watch"] ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: kube-state-metrics -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: kube-state-metrics -subjects: -- kind: ServiceAccount - name: kube-state-metrics - namespace: monitoring ---- -apiVersion: v1 -kind: Service -metadata: - namespace: kube-system - name: kube-scheduler-prometheus-discovery - labels: - k8s-app: kube-scheduler -spec: - selector: - k8s-app: kube-scheduler - type: ClusterIP - clusterIP: None - ports: - - name: http-metrics - port: 10251 - targetPort: 10251 - protocol: TCP ---- -apiVersion: v1 -kind: Service -metadata: - namespace: kube-system - name: kube-controller-manager-prometheus-discovery - labels: - k8s-app: kube-controller-manager -spec: - selector: - k8s-app: kube-controller-manager - type: ClusterIP - clusterIP: None - ports: - - name: http-metrics - port: 10252 - targetPort: 10252 - protocol: TCP ---- -apiVersion: apps/v1 -kind: DaemonSet -metadata: - name: node-exporter - namespace: monitoring -spec: - selector: - matchLabels: - app: node-exporter - template: - metadata: - labels: - app: node-exporter - name: node-exporter - spec: - hostNetwork: true - hostPID: true - containers: - - image: quay.io/prometheus/node-exporter:v0.15.0 - args: - - "--path.procfs=/host/proc" - - "--path.sysfs=/host/sys" - name: node-exporter - ports: - - containerPort: 9100 - hostPort: 9100 - name: scrape - resources: - requests: - memory: 30Mi - cpu: 100m - limits: - memory: 50Mi - cpu: 200m - volumeMounts: - - name: proc - readOnly: true - mountPath: /host/proc - - name: sys - readOnly: true - mountPath: /host/sys - tolerations: - - effect: NoSchedule - operator: Exists - volumes: - - name: proc - hostPath: - path: /proc - - name: sys - hostPath: - path: /sys ---- -apiVersion: v1 -kind: Service -metadata: - labels: - app: node-exporter - k8s-app: node-exporter - name: node-exporter - namespace: monitoring -spec: - type: ClusterIP - clusterIP: None - ports: - - name: http-metrics - port: 9100 - protocol: TCP - selector: - app: node-exporter ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: kube-state-metrics - namespace: monitoring -spec: - replicas: 1 - selector: - matchLabels: - app: kube-state-metrics - template: - metadata: - labels: - app: kube-state-metrics - spec: - serviceAccountName: kube-state-metrics - containers: - - name: kube-state-metrics - image: quay.io/coreos/kube-state-metrics:v1.0.1 - ports: - - name: metrics - containerPort: 8080 - readinessProbe: - httpGet: - path: /healthz - port: 8080 - initialDelaySeconds: 5 - timeoutSeconds: 5 - - name: addon-resizer - image: k8s.gcr.io/addon-resizer:1.0 - resources: - limits: - cpu: 100m - memory: 30Mi - requests: - cpu: 100m - memory: 30Mi - env: - - name: MY_POD_NAME - valueFrom: - fieldRef: - fieldPath: metadata.name - - name: MY_POD_NAMESPACE - valueFrom: - fieldRef: - fieldPath: metadata.namespace - command: - - /pod_nanny - - --container=kube-state-metrics - - --cpu=100m - - --extra-cpu=1m - - --memory=100Mi - - --extra-memory=2Mi - - --threshold=5 - - --deployment=kube-state-metrics ---- -apiVersion: v1 -kind: Service -metadata: - labels: - app: kube-state-metrics - k8s-app: kube-state-metrics - name: kube-state-metrics - namespace: monitoring -spec: - ports: - - name: http-metrics - port: 8080 - targetPort: metrics - protocol: TCP - selector: - app: kube-state-metrics ---- -apiVersion: monitoring.coreos.com/v1 -kind: Prometheus -metadata: - name: prometheus - namespace: monitoring - labels: - prometheus: prometheus -spec: - replicas: 2 - version: v2.0.0-rc.1 - serviceAccountName: prometheus-operator - serviceMonitorSelector: - matchExpressions: - - {key: k8s-app, operator: Exists} - ruleSelector: - matchLabels: - role: prometheus-rulefiles - prometheus: prometheus - resources: - requests: - # 2Gi is default, but won't schedule if you don't have a node with >2Gi - # memory. Modify based on your target and time-series count for - # production use. This value is mainly meant for demonstration/testing - # purposes. - memory: 400Mi - alerting: - alertmanagers: - - namespace: monitoring - name: alertmanager-main - port: web ---- -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - name: prometheus-operator - namespace: monitoring - labels: - k8s-app: prometheus-operator -spec: - endpoints: - - port: http - selector: - matchLabels: - k8s-app: prometheus-operator - namespaceSelector: - matchNames: - - monitoring ---- -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - name: kube-apiserver - namespace: monitoring - labels: - k8s-app: apiserver -spec: - jobLabel: component - selector: - matchLabels: - component: apiserver - provider: kubernetes - namespaceSelector: - matchNames: - - default - endpoints: - - port: https - interval: 30s - scheme: https - tlsConfig: - caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt - serverName: kubernetes - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token ---- -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - name: kubelet - namespace: monitoring - labels: - k8s-app: kubelet -spec: - jobLabel: k8s-app - endpoints: - - port: http-metrics - interval: 30s - - port: cadvisor - interval: 30s - honorLabels: true - selector: - matchLabels: - k8s-app: kubelet - namespaceSelector: - matchNames: - - kube-system ---- -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - name: kube-controller-manager - namespace: monitoring - labels: - k8s-app: kube-controller-manager -spec: - jobLabel: k8s-app - endpoints: - - port: http-metrics - interval: 30s - selector: - matchLabels: - k8s-app: kube-controller-manager - namespaceSelector: - matchNames: - - kube-system ---- -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - name: kube-scheduler - namespace: monitoring - labels: - k8s-app: kube-scheduler -spec: - jobLabel: k8s-app - endpoints: - - port: http-metrics - interval: 30s - selector: - matchLabels: - k8s-app: kube-scheduler - namespaceSelector: - matchNames: - - kube-system ---- -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - name: kube-state-metrics - namespace: monitoring - labels: - k8s-app: kube-state-metrics -spec: - jobLabel: k8s-app - selector: - matchLabels: - k8s-app: kube-state-metrics - namespaceSelector: - matchNames: - - monitoring - endpoints: - - port: http-metrics - interval: 30s - honorLabels: true ---- -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - name: node-exporter - namespace: monitoring - labels: - k8s-app: node-exporter -spec: - jobLabel: k8s-app - selector: - matchLabels: - k8s-app: node-exporter - namespaceSelector: - matchNames: - - monitoring - endpoints: - - port: http-metrics - interval: 30s ---- -apiVersion: monitoring.coreos.com/v1 -kind: Alertmanager -metadata: - name: main - namespace: monitoring - labels: - alertmanager: main -spec: - replicas: 3 - version: v0.9.1 ---- -apiVersion: v1 -data: - alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0Kcm91dGU6CiAgZ3JvdXBfYnk6IFsnam9iJ10KICBncm91cF93YWl0OiAzMHMKICBncm91cF9pbnRlcnZhbDogNW0KICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJlY2VpdmVyOiAnd2ViaG9vaycKcmVjZWl2ZXJzOgotIG5hbWU6ICd3ZWJob29rJwogIHdlYmhvb2tfY29uZmlnczoKICAtIHVybDogJ2h0dHA6Ly9hbGVydG1hbmFnZXJ3aDozMDUwMC8nCg== -kind: Secret -metadata: - name: alertmanager-main - namespace: monitoring -type: Opaque diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/redis/redis.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/redis/redis.yaml new file mode 100644 index 00000000..30e14d19 --- /dev/null +++ b/02-path-working-with-clusters/201-cluster-monitoring/templates/redis/redis.yaml @@ -0,0 +1,35 @@ +apiVersion: apps/v1beta1 +kind: Deployment +metadata: + name: redis +spec: + replicas: 1 # tells deployment to run 2 pods matching the template + template: # create pods using pod definition in this template + metadata: + annotations: + ad.datadoghq.com/redis.check_names: '["redisdb"]' + ad.datadoghq.com/redis.init_configs: '[{}]' + ad.datadoghq.com/redis.instances: '[{"host": "%%host%%","port":"6379"}]' + labels: + role: redis + spec: + containers: + - name: redis + image: charlyyfon/nodeapp:redis + imagePullPolicy: Always + ports: + - name: redis + containerPort: 6379 +--- +apiVersion: v1 +kind: Service +metadata: + name: redis + labels: + role: redis +spec: + ports: + - port: 6379 + targetPort: 6379 + selector: + role: redis \ No newline at end of file diff --git a/02-path-working-with-clusters/201-cluster-monitoring/templates/webapp/webapp.yaml b/02-path-working-with-clusters/201-cluster-monitoring/templates/webapp/webapp.yaml new file mode 100644 index 00000000..90f5ccd8 --- /dev/null +++ b/02-path-working-with-clusters/201-cluster-monitoring/templates/webapp/webapp.yaml @@ -0,0 +1,36 @@ +apiVersion: apps/v1beta1 +kind: Deployment +metadata: + name: fan +spec: + replicas: 1 # tells deployment to run 2 pods matching the template + template: # create pods using pod definition in this template + metadata: + labels: + role: fan + spec: + containers: + - name: fan + image: charlyyfon/nodeapp:fetch + imagePullPolicy: Always + ports: + - name: fan + containerPort: 5000 + env: + - name: API_KEY + value: DD_API_KEY +--- +apiVersion: v1 +kind: Service +metadata: + name: fan + labels: + role: fan +spec: + ports: + - port: 5000 + targetPort: 5000 + protocol: TCP + selector: + role: fan + type: ClusterIP \ No newline at end of file diff --git a/resources/images/coffeehouse.png b/resources/images/coffeehouse.png new file mode 100644 index 00000000..395d65a4 Binary files /dev/null and b/resources/images/coffeehouse.png differ diff --git a/resources/images/container-map.png b/resources/images/container-map.png new file mode 100644 index 00000000..df81c72c Binary files /dev/null and b/resources/images/container-map.png differ diff --git a/resources/images/container-view.png b/resources/images/container-view.png new file mode 100644 index 00000000..42e4b4c6 Binary files /dev/null and b/resources/images/container-view.png differ diff --git a/resources/images/datadog-logo.png b/resources/images/datadog-logo.png new file mode 100644 index 00000000..dbead5b9 Binary files /dev/null and b/resources/images/datadog-logo.png differ diff --git a/resources/images/datadogdashboards.png b/resources/images/datadogdashboards.png new file mode 100644 index 00000000..0c17f7bb Binary files /dev/null and b/resources/images/datadogdashboards.png differ diff --git a/resources/images/full-trace.png b/resources/images/full-trace.png new file mode 100644 index 00000000..3c7752a4 Binary files /dev/null and b/resources/images/full-trace.png differ diff --git a/resources/images/go-to-redis-traces.png b/resources/images/go-to-redis-traces.png new file mode 100644 index 00000000..6d9340ed Binary files /dev/null and b/resources/images/go-to-redis-traces.png differ diff --git a/resources/images/hostmap.png b/resources/images/hostmap.png new file mode 100644 index 00000000..ad5c4d4b Binary files /dev/null and b/resources/images/hostmap.png differ diff --git a/resources/images/infinite-demo.png b/resources/images/infinite-demo.png new file mode 100644 index 00000000..970aea6b Binary files /dev/null and b/resources/images/infinite-demo.png differ diff --git a/resources/images/kubernetes-dashboard-default.png b/resources/images/kubernetes-dashboard-default.png deleted file mode 100644 index c4ce62a6..00000000 Binary files a/resources/images/kubernetes-dashboard-default.png and /dev/null differ diff --git a/resources/images/logmonitor.png b/resources/images/logmonitor.png new file mode 100644 index 00000000..39aff51c Binary files /dev/null and b/resources/images/logmonitor.png differ diff --git a/resources/images/minikube-dashboard.png b/resources/images/minikube-dashboard.png deleted file mode 100644 index eb46b839..00000000 Binary files a/resources/images/minikube-dashboard.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-dashboards-cluster.png b/resources/images/monitoring-grafana-dashboards-cluster.png deleted file mode 100644 index 1b118afc..00000000 Binary files a/resources/images/monitoring-grafana-dashboards-cluster.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-dashboards-pods.png b/resources/images/monitoring-grafana-dashboards-pods.png deleted file mode 100644 index 8858d5cf..00000000 Binary files a/resources/images/monitoring-grafana-dashboards-pods.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-dashboards.png b/resources/images/monitoring-grafana-dashboards.png deleted file mode 100644 index d8118198..00000000 Binary files a/resources/images/monitoring-grafana-dashboards.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-prometheus-dashboard-1.png b/resources/images/monitoring-grafana-prometheus-dashboard-1.png deleted file mode 100644 index 8f0342c3..00000000 Binary files a/resources/images/monitoring-grafana-prometheus-dashboard-1.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-prometheus-dashboard-2.png b/resources/images/monitoring-grafana-prometheus-dashboard-2.png deleted file mode 100644 index 8208ec24..00000000 Binary files a/resources/images/monitoring-grafana-prometheus-dashboard-2.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-prometheus-dashboard-3.png b/resources/images/monitoring-grafana-prometheus-dashboard-3.png deleted file mode 100644 index bd930bfd..00000000 Binary files a/resources/images/monitoring-grafana-prometheus-dashboard-3.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-prometheus-dashboard-capacity-planning.png b/resources/images/monitoring-grafana-prometheus-dashboard-capacity-planning.png deleted file mode 100644 index 0d8f3283..00000000 Binary files a/resources/images/monitoring-grafana-prometheus-dashboard-capacity-planning.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-prometheus-dashboard-cluster-status.png b/resources/images/monitoring-grafana-prometheus-dashboard-cluster-status.png deleted file mode 100644 index 3727241a..00000000 Binary files a/resources/images/monitoring-grafana-prometheus-dashboard-cluster-status.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-prometheus-dashboard-control-plane-status.png b/resources/images/monitoring-grafana-prometheus-dashboard-control-plane-status.png deleted file mode 100644 index dd8fb9cf..00000000 Binary files a/resources/images/monitoring-grafana-prometheus-dashboard-control-plane-status.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-prometheus-dashboard-dashboard-home.png b/resources/images/monitoring-grafana-prometheus-dashboard-dashboard-home.png deleted file mode 100644 index aee044d0..00000000 Binary files a/resources/images/monitoring-grafana-prometheus-dashboard-dashboard-home.png and /dev/null differ diff --git a/resources/images/monitoring-grafana-prometheus-dashboard-nodes.png b/resources/images/monitoring-grafana-prometheus-dashboard-nodes.png deleted file mode 100644 index 0e57e59e..00000000 Binary files a/resources/images/monitoring-grafana-prometheus-dashboard-nodes.png and /dev/null differ diff --git a/resources/images/monitoring-nodes-after.png b/resources/images/monitoring-nodes-after.png deleted file mode 100644 index 72555fc0..00000000 Binary files a/resources/images/monitoring-nodes-after.png and /dev/null differ diff --git a/resources/images/monitoring-nodes-before.png b/resources/images/monitoring-nodes-before.png deleted file mode 100644 index 7cfa316b..00000000 Binary files a/resources/images/monitoring-nodes-before.png and /dev/null differ diff --git a/resources/images/monitoring-pods-after.png b/resources/images/monitoring-pods-after.png deleted file mode 100644 index 1a3231fe..00000000 Binary files a/resources/images/monitoring-pods-after.png and /dev/null differ diff --git a/resources/images/monitoring-pods-before.png b/resources/images/monitoring-pods-before.png deleted file mode 100644 index 0bb15b67..00000000 Binary files a/resources/images/monitoring-pods-before.png and /dev/null differ diff --git a/resources/images/redis-apm-monitor.png b/resources/images/redis-apm-monitor.png new file mode 100644 index 00000000..55b32ee4 Binary files /dev/null and b/resources/images/redis-apm-monitor.png differ diff --git a/resources/images/redis-dashboard.png b/resources/images/redis-dashboard.png new file mode 100644 index 00000000..ecf77cd4 Binary files /dev/null and b/resources/images/redis-dashboard.png differ diff --git a/resources/images/redis-logs.png b/resources/images/redis-logs.png new file mode 100644 index 00000000..deee9f9e Binary files /dev/null and b/resources/images/redis-logs.png differ diff --git a/resources/images/redis-traces.png b/resources/images/redis-traces.png new file mode 100644 index 00000000..cf7b9f73 Binary files /dev/null and b/resources/images/redis-traces.png differ diff --git a/resources/images/traces.png b/resources/images/traces.png new file mode 100644 index 00000000..c3eaf00b Binary files /dev/null and b/resources/images/traces.png differ diff --git a/resources/images/webapp.png b/resources/images/webapp.png new file mode 100644 index 00000000..e3f4d70d Binary files /dev/null and b/resources/images/webapp.png differ