Skip to content

Commit 3736c9a

Browse files
committed
docs: Add best practices for metrics
1 parent 17151ac commit 3736c9a

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed

docs/design/metrics-best-practices.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Kube-State-Metrics - Timeseries best practices
2+
3+
---
4+
5+
Author: Manuel Rüger (<[email protected]>)
6+
7+
Date: October 17th 2024
8+
9+
---
10+
11+
## Introduction
12+
13+
Kube-State-Metrics' goal is to provide insights into the state of Kubernetes objects by exposing them as metrics.
14+
This document provides guidelines with the goal to create a good user experience when using these metrics.
15+
16+
Please be aware that this document is introduced in a later stage of the project and there might be metrics that do not follow these best practices.
17+
Feel encouraged to report these metrics and provide a pull request to improve them.
18+
19+
## General best practices
20+
21+
We follow [Prometheus](https://prometheus.io/docs/practices/naming/) best practices in terms of naming and labeloing.
22+
23+
## Best practices for kube-state-metrics
24+
25+
### Avoid pre-computation
26+
27+
kube-state-metrics should expose metrics on an individual object level and avoid any sort of pre-computation unless it is required due to for example high cardinality on objects.
28+
By exposing raw metrics instead of counters, kube-state-metrics allows the user to have full control on how they want to use the metrics.
29+
30+
### Static object properties
31+
32+
An object usually has a stable set of properties that do not change during its lifecycle in Kubernetes.
33+
This includes properties like name, namespace, uid etc.
34+
It is a good practice to group those together into an `_info` metric.
35+
36+
### Dynamic object properties
37+
38+
An object can also have a dynamic set of properties, which are usually part of the status field.
39+
These change during the lifecycle of the object.
40+
For example a pod can be in different states like "Pending", "Running" etc.
41+
These should be part of a new metric that includes labels that identify the object as well as the dynamic property.
42+
43+
### Linked properties
44+
45+
If an object contains a substructure that links multiple properties together (e.g. endpoint address and port), those should be reported in the same metric.
46+
47+
### Optional properties
48+
49+
Some Kubernetes objects have optional fields. In case there is an optional value, it is better to not expose the label at all instead of exposing a "nil" value or an empty string.
50+
51+
### Timestamps
52+
53+
Timestamps like creation time or modification time should be exposed as a value. The metric should end with `_timestamp_seconds`.
54+
55+
### Cardinality
56+
57+
Some object properties can cause cardinality issues if they can contain a lot of different values or are linked together with multiple properties that also can change a lot.
58+
In this case it is better to limit the number of values that can be exposed within kube-state-metrics by allowing only a few of them and have a default for others.
59+
If for example the Kubernetes object contains a status field that contains an error message that can change a lot, it would be better to have a boolean `error="true"` label in case there is an error.
60+
If there are some error messages that are worth exposing, those could be exposed and for any other message, a default value could be provided.
61+
62+
## Stability
63+
64+
We follow the stability framework derived from Kubernetes, in which we expose experimental and stable metrics.
65+
Experimental metrics are recently introduced or expose alpha/beta resources in the Kubernetes API.
66+
They can change anytime and should be used with caution.
67+
They can be promoted to a stable metric once the object stabilized in the Kubernetes API or they were part of two consecutive releases and haven't observed any changes in them.
68+
69+
Stable metrics are considered frozen with the exception of new labels being added.
70+
A stable metric or a label on a stable metric can be deprecated in release Major.Minor and the earliest point it will be removed is the release Major.Minor+2.

0 commit comments

Comments
 (0)