Missing CPU Metrics for Libvirt Instances #402
-
I'm using ceems version 0.10.2 to collect metrics from Libvirt, but I'm unable to get CPU usage metrics for the virtual machines. While other metrics, such as memory and some CPU-related counters (ceems_compute_unit_cpu_user_seconds_total, ceems_compute_unit_cpu_system_seconds_total), are being scraped, key metrics like CPU utilization are missing. Additionally, I'm seeing the following error in the logs, which seems to be related to the issue:
Steps Taken to Troubleshoot ceems_compute_unit_cpu_psi_seconds Analyzed the error: The log error clearly states that the instance-0000004f.xml file is not found at /etc/libvirt/qemu/. Attempted a symlink fix: To address the no such file or directory error, I created a symbolic link for the directory. However, this did not resolve the problem. Instead, after creating the symlink, the ceems exporter stopped finding any instances at all. Environment Details Operating System: Ubuntu 24.04 Libvirt version: 10.0.0 Metrics:
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 22 replies
-
Hello @vurmil Thanks for the detailed report. CPU usage is a derived metric that can be estimated using the following three metrics
It is generally a good idea to always export raw metrics and use Prometheus' recording rules to estimate the derived metrics. To estimate the CPU usage percentage, you will need to use following query:
Or you can add this rule to your Prometheus instance which will estimate the CPU usage and memory usage of virtual machines in real time and create a new metric. Regarding the error about XML file: Cheers! |
Beta Was this translation helpful? Give feedback.
-
Hello @mahendrapaipuri Regarding the error about the XML file, I can confirm that the file does not exist in the location you mentioned. The instance-0000004f.xml file is located at a different path: /run/libvirt/qemu/instance-0000004f.xml. Despite the file not being in the expected location, we are successfully collecting metrics for this virtual machine and data is being stored in Prometheus. We are getting a CPU usage value of 5.78175166667279 from the following expression:
However, I've noticed an inconsistency in the collected data. The metric labels show uuid="instance-0000004f", which appears to be the instance name, not the UUID. The instance_id in the log error also uses the instance name. This is an issue as the metric should ideally be collected and labeled with the VM's true UUID for proper identification and tracking. Could you please assist me in understanding why the uuid label is being populated with the instance name instead of the actual UUID? ![]() ![]() |
Beta Was this translation helpful? Give feedback.
-
Hello, @mahendrapaipuri Thank you for your guidance. I have added the --collector.libvirt.xml-dir=/run/libvirt/qemu flag to the ceems_exporter CLI arguments. I can confirm that the ceems-exporter has stopped logging the XML file error. However, the uuid field is now completely empty in the collected metrics. There are no labels containing uuid at all.
I am using OpenStack 2025.1 as my Virtual Machine Manager. Here is the content of the XML file: cat /run/libvirt/qemu/instance-0000004f.xml
|
Beta Was this translation helpful? Give feedback.
-
Sorry, I have two environments and I got it mixed up. Of course, I have the Prometheus rule added. When I inspect the data in Graphana, it doesn't show the data. Do you know why? ![]() More info: Inspect: ⚙️ Average Utilization by admin return {"status":"success","data":[]} in api logs if I remove all parameters: http://198.19.1.21:9020/api/v1/usage/current/admin - then it starts showing what data
empty CPU columns ![]() I see that this data is missing in the return JSON
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
@vurmil New release v0.11.0 has been made with support for runtime XML files for libvirt. There are some breaking changes in metric labeling. Please consult the changelog.