Skip to content

Commit 9faec3d

Browse files
joshulynemrbobbytablesonlydole
authored
K8s KPIs with Kuberhealthy blog post (#21153)
* Kuberhealthy 2.1.0 blog post * initial rewrite * Add images * Add k8s kpis * edit * edit * edit * Address comments * Add promql label * Update date * fix typo in kuberhealthy post Co-authored-by: Taylor Dolezal <[email protected]> Co-authored-by: Bob Killen <[email protected]> Co-authored-by: Taylor Dolezal <[email protected]>
1 parent 6b60ca8 commit 9faec3d

File tree

1 file changed

+201
-0
lines changed

1 file changed

+201
-0
lines changed
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
---
2+
layout: blog
3+
title: "K8s KPIs with Kuberhealthy"
4+
date: 2020-05-29
5+
---
6+
7+
**Authors:** Joshulyne Park (Comcast), Eric Greer (Comcast)
8+
9+
### Building Onward from Kuberhealthy v2.0.0
10+
11+
Last November at KubeCon San Diego 2019, we announced the release of
12+
[Kuberhealthy 2.0.0](https://www.youtube.com/watch?v=aAJlWhBtzqY) - transforming Kuberhealthy into a Kubernetes operator
13+
for synthetic monitoring. This new ability granted developers the means to create their own Kuberhealthy check
14+
containers to synthetically monitor their applications and clusters. The community was quick to adopt this new feature and we're grateful for everyone who implemented and tested Kuberhealthy 2.0.0 in their clusters. Thanks to all of you who reported
15+
issues and contributed to discussions on the #kuberhealthy Slack channel. We quickly set to work to address all your feedback
16+
with a newer version of Kuberhealthy. Additionally, we created a guide on how to easily install and use Kuberhealthy in order to capture some helpful synthetic [KPIs](https://kpi.org/KPI-Basics).
17+
18+
### Deploying Kuberhealthy
19+
20+
To install Kuberhealthy, make sure you have [Helm 3](https://helm.sh/docs/intro/install/) installed. If not, you can use the generated flat spec files located
21+
in this [deploy folder](https://github.com/Comcast/kuberhealthy/tree/master/deploy). You should use [kuberhealthy-prometheus.yaml](https://github.com/Comcast/kuberhealthy/blob/master/deploy/kuberhealthy-prometheus.yaml) if you don't use the [Prometheus Operator](https://github.com/coreos/prometheus-operator), and [kuberhealthy-prometheus-operator.yaml](https://github.com/Comcast/kuberhealthy/blob/master/deploy/kuberhealthy-prometheus-operator.yaml) if you do. If you don't use Prometheus at all, you can still use Kuberhealthy with a JSON status page and/or InfluxDB integration using [this spec](https://github.com/Comcast/kuberhealthy/blob/master/deploy/kuberhealthy.yaml).
22+
23+
#### To install using Helm 3:
24+
##### 1. Create namespace "kuberhealthy" in the desired Kubernetes cluster/context:
25+
```
26+
kubectl create namespace kuberhealthy
27+
```
28+
##### 2. Set your current namespace to "kuberhealthy":
29+
```
30+
kubectl config set-context --current --namespace=kuberhealthy
31+
```
32+
##### 3. Add the kuberhealthy repo to Helm:
33+
```
34+
helm repo add kuberhealthy https://comcast.github.io/kuberhealthy/helm-repos
35+
```
36+
##### 4. Depending on your Prometheus implementation, install Kuberhealthy using the appropriate command for your cluster:
37+
38+
- If you use the [Prometheus Operator](https://github.com/coreos/prometheus-operator):
39+
```
40+
helm install kuberhealthy kuberhealthy/kuberhealthy --set prometheus.enabled=true,prometheus.enableAlerting=true,prometheus.enableScraping=true,prometheus.serviceMonitor=true
41+
```
42+
43+
- If you use Prometheus, but NOT Prometheus Operator:
44+
```
45+
helm install kuberhealthy kuberhealthy/kuberhealthy --set prometheus.enabled=true,prometheus.enableAlerting=true,prometheus.enableScraping=true
46+
```
47+
See additional details about configuring the appropriate scrape annotations in the section [Prometheus Integration Details](#prometheus-integration-details) below.
48+
49+
- Finally, if you don't use Prometheus:
50+
```
51+
helm install kuberhealthy kuberhealthy/kuberhealthy
52+
```
53+
54+
Running the Helm command should automatically install the newest version of Kuberhealthy (v2.2.0) along with a few basic checks. If you run `kubectl get pods`, you should see two Kuberhealthy pods. These are the pods that create, coordinate, and track test pods. These two Kuberhealthy pods also serve a JSON status page as well as a `/metrics` endpoint. Every other pod you see created is a checker pod designed to execute and shut down when done.
55+
56+
### Configuring Additional Checks
57+
58+
Next, you can run `kubectl get khchecks`. You should see three Kuberhealthy checks installed by default:
59+
- [daemonset](https://github.com/Comcast/kuberhealthy/tree/master/cmd/daemonset-check): Deploys and tears down a daemonset to ensure all nodes in the cluster are functional.
60+
- [deployment](https://github.com/Comcast/kuberhealthy/tree/master/cmd/deployment-check): Creates a deployment and then triggers a rolling update. Tests that the deployment is reachable via a service and then deletes everything. Any problem in this process will cause this check to report a failure.
61+
- [dns-status-internal](https://github.com/Comcast/kuberhealthy/tree/master/cmd/dns-resolution-check): Validates that internal cluster DNS is functioning as expected.
62+
63+
To view other available external checks, check out the [external checks registry](https://github.com/Comcast/kuberhealthy/blob/master/docs/EXTERNAL_CHECKS_REGISTRY.md) where you can find other yaml files you can apply to your cluster to enable various checks.
64+
65+
Kuberhealthy check pods should start running shortly after Kuberhealthy starts running (1-2 minutes). Additionally, the check-reaper cronjob runs every few minutes to ensure there are no more than 5 completed checker pods left lying around at a time.
66+
67+
To get status page view of these checks, you'll need to either expose the `kuberhealthy` service externally by editing the service `kuberhealthy` and setting `Type: LoadBalancer` or use `kubectl port-forward service/kuberhealthy 8080:80`. When viewed, the service endpoint will display a JSON status page that looks like this:
68+
69+
```json
70+
{
71+
"OK": true,
72+
"Errors": [],
73+
"CheckDetails": {
74+
"kuberhealthy/daemonset": {
75+
"OK": true,
76+
"Errors": [],
77+
"RunDuration": "22.512278967s",
78+
"Namespace": "kuberhealthy",
79+
"LastRun": "2020-04-06T23:20:31.7176964Z",
80+
"AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
81+
"uuid": "9abd3ec0-b82f-44f0-b8a7-fa6709f759cd"
82+
},
83+
"kuberhealthy/deployment": {
84+
"OK": true,
85+
"Errors": [],
86+
"RunDuration": "29.142295647s",
87+
"Namespace": "kuberhealthy",
88+
"LastRun": "2020-04-06T23:20:31.7176964Z",
89+
"AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
90+
"uuid": "5f0d2765-60c9-47e8-b2c9-8bc6e61727b2"
91+
},
92+
"kuberhealthy/dns-status-internal": {
93+
"OK": true,
94+
"Errors": [],
95+
"RunDuration": "2.43940936s",
96+
"Namespace": "kuberhealthy",
97+
"LastRun": "2020-04-06T23:20:44.6294547Z",
98+
"AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
99+
"uuid": "c85f95cb-87e2-4ff5-b513-e02b3d25973a"
100+
}
101+
},
102+
"CurrentMaster": "kuberhealthy-7cf79bdc86-m78qr"
103+
}
104+
```
105+
106+
This JSON page displays all Kuberhealthy checks running in your cluster. If you have Kuberhealthy checks running in different namespaces, you can filter them by adding the `GET` variable `namespace` parameter: `?namespace=kuberhealthy,kube-system` onto the status page URL.
107+
108+
109+
### Writing Your Own Checks
110+
111+
Kuberhealthy is designed to be extended with custom check containers that can be written by anyone to check anything. These checks can be written in any language as long as they are packaged in a container. This makes Kuberhealthy an excellent platform for creating your own synthetic checks!
112+
113+
Creating your own check is a great way to validate your client library, simulate real user workflow, and create a high level of confidence in your service or system uptime.
114+
115+
To learn more about writing your own checks, along with simple examples, check the [custom check creation](https://github.com/Comcast/kuberhealthy/blob/master/docs/EXTERNAL_CHECK_CREATION.md) documentation.
116+
117+
118+
### Prometheus Integration Details
119+
120+
When enabling Prometheus (not the operator), the Kuberhealthy service gets the following annotations added:
121+
```.env
122+
prometheus.io/path: /metrics
123+
prometheus.io/port: "80"
124+
prometheus.io/scrape: "true"
125+
```
126+
127+
In your prometheus configuration, add the following example scrape_config that scrapes the Kuberhealthy service given the added prometheus annotation:
128+
129+
```yaml
130+
- job_name: 'kuberhealthy'
131+
scrape_interval: 1m
132+
honor_labels: true
133+
metrics_path: /metrics
134+
kubernetes_sd_configs:
135+
- role: service
136+
namespaces:
137+
names:
138+
- kuberhealthy
139+
relabel_configs:
140+
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
141+
action: keep
142+
regex: true
143+
```
144+
145+
You can also specify the target endpoint to be scraped using this example job:
146+
```yaml
147+
- job_name: kuberhealthy
148+
scrape_interval: 1m
149+
honor_labels: true
150+
metrics_path: /metrics
151+
static_configs:
152+
- targets:
153+
- kuberhealthy.kuberhealthy.svc.cluster.local:80
154+
```
155+
156+
Once the appropriate prometheus configurations are applied, you should be able to see the following Kuberhealthy metrics:
157+
- `kuberhealthy_check`
158+
- `kuberhealthy_check_duration_seconds`
159+
- `kuberhealthy_cluster_states`
160+
- `kuberhealthy_running`
161+
162+
### Creating Key Performance Indicators
163+
164+
Using these Kuberhealthy metrics, our team has been able to collect KPIs based on the following definitions, calculations, and PromQL queries.
165+
166+
*Availability*
167+
168+
We define availability as the K8s cluster control plane being up and functioning as expected. This is measured by our ability to create a deployment, do a rolling update, and delete the deployment within a set period of time.
169+
170+
We calculate this by measuring Kuberhealthy's [deployment check](https://github.com/Comcast/kuberhealthy/tree/master/cmd/deployment-check) successes and failures.
171+
- Availability = Uptime / (Uptime * Downtime)
172+
- Uptime = Number of Deployment Check Passes * Check Run Interval
173+
- Downtime = Number of Deployment Check Fails * Check Run Interval
174+
- Check Run Interval = how often the check runs (`runInterval` set in your KuberhealthyCheck Spec)
175+
176+
- PromQL Query (Availability % over the past 30 days):
177+
```promql
178+
1 - (sum(count_over_time(kuberhealthy_check{check="kuberhealthy/deployment", status="0"}[30d])) OR vector(0))/(sum(count_over_time(kuberhealthy_check{check="kuberhealthy/deployment", status="1"}[30d])) * 100)
179+
```
180+
181+
*Utilization*
182+
183+
We define utilization as user uptake of product (k8s) and its resources (pods, services, etc.). This is measured by how many nodes, deployments, statefulsets, persistent volumes, services, pods, and jobs are being utilized by our customers.
184+
We calculate this by counting the total number of nodes, deployments, statefulsets, persistent volumes, services, pods, and jobs.
185+
186+
*Duration (Latency)*
187+
188+
We define duration as the control plane's capacity and utilization of throughput. We calculate this by capturing the average run duration of a Kuberhealthy [deployment check](https://github.com/Comcast/kuberhealthy/tree/master/cmd/deployment-check) run.
189+
190+
- PromQL Query (Deployment check average run duration):
191+
```promql
192+
avg(kuberhealthy_check_duration_seconds{check="kuberhealthy/deployment"})
193+
```
194+
195+
*Errors / Alerts*
196+
197+
We define errors as all k8s cluster and Kuberhealthy related alerts. Every time one of our Kuberhealthy check fails, we are alerted of this failure.
198+
199+
### Thank You!
200+
201+
Thanks again to everyone in the community for all of your contributions and help! We are excited to see what you build. As always, if you find an issue, have a feature request, or need to open a pull request, please [open an issue](https://github.com/Comcast/kuberhealthy/issues) on the Github project.

0 commit comments

Comments
 (0)