kubevirt/docs/monitoring-guidelines.md at master · ezrasilvera/kubevirt

KubeVirt Monitoring

The KubeVirt metrics should align with the Kubernetes metrics names.

The KubeVirt Users should have the same experience when searching for a node, container, pod and virtual machine metrics.

Naming requirements:

Check if a similar Kubernetes metric, for node, container or pod, exists and try to align to it.
KubeVirt metric for a running VM should have a kubevirt_vmi_ prefix

For Example, see the following Kubernetes network metrics:

The KubeVirt metrics for vmi should be:

The Prometheus recording rules appear in Prometheus as metrics.

In order to easily identify the KubeVirt recording rules, they should have a kubevirt_ prefix.

When creating a KubeVirt alert rule, please see the following :

Use recording rules when doing calculations.
Create an alert runbook at KubeVirt runbooks.
Alert rule must include runbook_url with the link to your runbook from step #2.
Alert rule must include severity. One of: critical, warning, info.

NOTE:
- Critical alerts - When the service is down and you loss critical functionality, an action is required immediately.
- Warning alerts - When an alert require user intervention. A more serious issue may develop if this is not resolved soon.
- Info alerts - When a minor problem has been detected. It should be resolved relatively soon and not ignored.
Alert message must be verbose, since it is being propagated to the metrics.md file, when running make-generate.