Skip to content

Commit 3a77da5

Browse files
Merge pull request #30368 from aburdenthehand/cnv-8882-pql-metrics
CNV-8882 - Viewing Virt metrics with Prometheus
2 parents 03ca906 + 8ab62e2 commit 3a77da5

6 files changed

+140
-0
lines changed

_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2865,6 +2865,8 @@ Topics:
28652865
File: virt-using-dashboard-to-get-cluster-info
28662866
- Name: OpenShift cluster monitoring, logging, and Telemetry
28672867
File: virt-openshift-cluster-monitoring
2868+
- Name: Prometheus queries for virtual resources
2869+
File: virt-prometheus-queries
28682870
- Name: Collecting OpenShift Virtualization data for Red Hat Support
28692871
File: virt-collecting-virt-data
28702872
---

modules/monitoring-querying-metrics-for-all-projects-as-an-administrator.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
// Module included in the following assemblies:
22
//
33
// * monitoring/managing-metrics.adoc
4+
// * virt/logging_events_monitoring/virt-prometheus-queries.adoc
45

56
[id="querying-metrics-for-all-projects-as-an-administrator_{context}"]
67
= Querying metrics for all projects as a cluster administrator

modules/monitoring-querying-metrics-for-user-defined-projects-as-a-developer.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
// Module included in the following assemblies:
22
//
33
// * monitoring/managing-metrics.adoc
4+
// * virt/logging_events_monitoring/virt-prometheus-queries.adoc
45

56
[id="querying-metrics-for-user-defined-projects-as-a-developer_{context}"]
67
= Querying metrics for user-defined projects as a developer

modules/monitoring-querying-metrics.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
// Module included in the following assemblies:
22
//
33
// * monitoring/managing-metrics.adoc
4+
// * virt/logging_events_monitoring/virt-prometheus-queries.adoc
45

56
[id="querying-metrics_{context}"]
67
= Querying metrics

modules/virt-querying-metrics.adoc

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * virt/logging_events_monitoring/virt-prometheus-queries.adoc
4+
5+
[id="virt-querying-metrics_{context}"]
6+
= Virtualization metrics
7+
8+
The following metric descriptions include example Prometheus Query Language (PromQL) queries.
9+
10+
[NOTE]
11+
====
12+
These metrics are not an API and might change between versions.
13+
====
14+
15+
16+
[id="virt-promql-vcpu-metrics_{context}"]
17+
== vCPU metrics
18+
19+
`kubevirt_vmi_vcpu_wait_seconds`::
20+
Returns the wait time (in seconds) for a virtual machine's vCPU.
21+
22+
A value above '0' means that the vCPU wants to run, but the host scheduler cannot run it yet. This indicates that there is an issue with Input/Output.
23+
24+
.Example query
25+
[source,promql]
26+
----
27+
topk(3, sum by (name, namespace) (round(irate(kubevirt_vmi_vcpu_wait_seconds[6m]), 0.1))) > 0
28+
----
29+
The above query returns the top 3 VMs waiting for I/O at every given moment in time over the time period.
30+
31+
[id="virt-promql-network-metrics_{context}"]
32+
== Network metrics
33+
34+
`kubevirt_vmi_network_receive_bytes_total`::
35+
Returns the total amount of traffic received (in bytes) on the virtual machine's network.
36+
37+
`kubevirt_vmi_network_transmit_bytes_total`::
38+
Returns the total amount of traffic transmitted (in bytes) on the virtual machine's network.
39+
40+
These queries can be used to identify virtual machines that are saturating the network.
41+
42+
.Example query
43+
[source,promql]
44+
----
45+
topk(3, sum by (name, namespace) (round(irate(kubevirt_vmi_network_receive_bytes_total[6m]), 0.1)) + sum by (name, namespace) (round(irate(kubevirt_vmi_network_transmit_bytes_total[6m]) , 0.1))) > 0
46+
----
47+
The above query returns the top 3 VMs transmitting the most network traffic at every given moment in time over a six-minute time period.
48+
49+
[id="virt-promql-storage-metrics_{context}"]
50+
== Storage metrics
51+
52+
`kubevirt_vmi_storage_read_traffic_bytes_total`::
53+
Returns the total amount (in bytes) of the virtual machine's storage-related traffic.
54+
55+
`kubevirt_vmi_storage_write_traffic_bytes_total`::
56+
Returns the total amount of storage writes (in bytes) of the virtual machine's storage-related traffic.
57+
58+
These queries can be used to identify virtual machines that are writing large amounts of data.
59+
60+
.Example query
61+
[source,promql]
62+
----
63+
topk(3, sum by (name, namespace) (round(irate(kubevirt_vmi_storage_read_traffic_bytes_total[6m]), 0.1))
64+
+ sum by (name, namespace) (round(irate(kubevirt_vmi_storage_write_traffic_bytes_total[6m]), 0.1))) > 0
65+
----
66+
The above query returns the top 3 VMs performing the most storage traffic at every given moment in time over a six-minute time period.
67+
68+
`kubevirt_vmi_storage_iops_read_total`::
69+
Returns the amount of write I/O operations the virtual machine is performing per second.
70+
71+
`kubevirt_vmi_storage_iops_write_total`::
72+
Returns the amount of read I/O operations the virtual machine is performing per second.
73+
74+
These queries can be used to determine the I/O performance of storage devices.
75+
76+
.Example query
77+
[source,promql]
78+
----
79+
topk(3, sum by (name, namespace) (round(irate(kubevirt_vmi_storage_iops_read_total[6m]), 0.1))
80+
+ sum by (name, namespace) (round(irate(kubevirt_vmi_storage_iops_write_total[6m]) , 0.1))) > 0
81+
----
82+
The above query returns the top 3 VMs performing the most I/O operations per second at every given moment in time over a six-minute time period.
83+
84+
[id="virt-promql-guest-memory-metrics_{context}"]
85+
== Guest memory swapping metrics
86+
`kubevirt_vmi_memory_swap_in_traffic_bytes_total`::
87+
Returns the total amount (in bytes) of memory the virtual guest is swapping in.
88+
89+
`kubevirt_vmi_memory_swap_out_traffic_bytes_total`::
90+
Returns the total amount (in bytes) of memory the virtual guest is swapping out.
91+
92+
Memory swapping indicates that the virtual machine is under memory pressure. Increasing the memory allocation of the virtual machine can mitigate this issue.
93+
94+
[NOTE]
95+
====
96+
These queries only return data for virtual guests that have memory swapping enabled.
97+
====
98+
99+
.Example query
100+
[source,promql]
101+
----
102+
topk(3, sum by (name, namespace) (round(irate(kubevirt_vmi_memory_swap_in_traffic_bytes_total[6m]), 0.1))
103+
+ sum by (name, namespace) (round(irate(kubevirt_vmi_memory_swap_out_traffic_bytes_total[6m]), 0.1))) > 0
104+
----
105+
The above query returns the top 3 VMs where the guest is performing the most memory swapping at every given moment in time over a six-minute time period.
106+
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
[id="virt-prometheus-queries"]
2+
= Prometheus queries for virtual resources
3+
include::modules/virt-document-attributes.adoc[]
4+
:context: virt-prometheus-queries
5+
toc::[]
6+
7+
{VirtProductName} provides metrics for monitoring how infrastructure resources are consumed in the cluster. The metrics cover the following resources:
8+
9+
* vCPU
10+
* Network
11+
* Storage
12+
* Guest memory swapping
13+
14+
Use the {product-title} monitoring dashboard to query virtualization metrics.
15+
16+
.Prerequisite
17+
18+
* The vCPU metric requires the `schedstats=enable` kernel argument applied to the `MachineConfig` object before it can be used. This kernel argument enables scheduler statistics used for debugging and performance tuning and adds a minor additional load to the scheduler. See the xref:../../post_installation_configuration/machine-configuration-tasks.adoc#nodes-nodes-kernel-arguments_post-install-machine-configuration-tasks[{product-title} machine configuration tasks] documentation for more information on applying a kernel argument.
19+
20+
include::modules/monitoring-querying-metrics.adoc[leveloffset=+1]
21+
include::modules/monitoring-querying-metrics-for-all-projects-as-an-administrator.adoc[leveloffset=+2]
22+
include::modules/monitoring-querying-metrics-for-user-defined-projects-as-a-developer.adoc[leveloffset=+2]
23+
24+
include::modules/virt-querying-metrics.adoc[leveloffset=+1]
25+
26+
[id="{context}-additional-resources"]
27+
== Additional resources
28+
29+
* xref:../../monitoring/understanding-the-monitoring-stack.adoc#understanding-the-monitoring-stack[Understanding the {product-title} monitoring stack]

0 commit comments

Comments
 (0)