Merge pull request #41475 from lmandavi/CNV-8788-expose-vm-metrics

sjhala-ccs · web-flow · commit e6fc2f6eba18 · 2022-02-23T17:26:59.000-05:00
CNV-8788: Exposing custom metrics for virtual machines
diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml
@@ -3220,6 +3220,8 @@ Topics:
     File: virt-openshift-cluster-monitoring
   - Name: Prometheus queries for virtual resources
     File: virt-prometheus-queries
+  - Name: Exposing custom metrics for virtual machines
+    File: virt-exposing-custom-metrics-for-vms
   - Name: Collecting OpenShift Virtualization data for Red Hat Support
     File: virt-collecting-virt-data
     Distros: openshift-enterprise
diff --git a/modules/virt-accessing-node-exporter-outside-cluster.adoc b/modules/virt-accessing-node-exporter-outside-cluster.adoc
@@ -0,0 +1,51 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events-monitoring/virt-exposing-custom-metrics-for-vms.adoc
+
+:_content-type: PROCEDURE
+[id="virt-accessing-node-exporter-outside-cluster_{context}"]
+= Accessing the node exporter service outside the cluster
+
+You can access the node-exporter service outside the cluster and view the exposed metrics.
+
+.Prerequisites
+* You have access to the cluster as a user with `cluster-admin` privileges or the `monitoring-edit` role.
+* You have enabled monitoring for the user-defined project by configuring the node-exporter service.
+
+.Procedure
+
+. Expose the node-exporter service.
++
+[source,terminal]
+----
+$ oc expose service -n <namespace> <node_exporter_service_name>
+----
+. Obtain the FQDN (Fully Qualified Domain Name) for the route.
++
+[source,terminal]
+----
+$ oc get route -o=custom-columns=NAME:.metadata.name,DNS:.spec.host
+----
++
+.Example output
+[source,terminal]
+----
+NAME                    DNS
+node-exporter-service   node-exporter-service-dynamation.apps.cluster.example.org
+----
+. Use the `curl` command to display metrics for the node-exporter service.
++
+[source,terminal]
+----
+$ curl -s http://node-exporter-service-dynamation.apps.cluster.example.org/metrics
+----
++
+.Example output
+[source,terminal]
+----
+go_gc_duration_seconds{quantile="0"} 1.5382e-05
+go_gc_duration_seconds{quantile="0.25"} 3.1163e-05
+go_gc_duration_seconds{quantile="0.5"} 3.8546e-05
+go_gc_duration_seconds{quantile="0.75"} 4.9139e-05
+go_gc_duration_seconds{quantile="1"} 0.000189423
+----
diff --git a/modules/virt-configuring-node-exporter-service.adoc b/modules/virt-configuring-node-exporter-service.adoc
@@ -0,0 +1,54 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events-monitoring/virt-exposing-custom-metrics-for-vms.adoc
+
+:_content-type: PROCEDURE
+[id="virt-configuring-node-exporter-service_{context}"]
+= Configuring the node exporter service
+
+The node-exporter agent is deployed on every virtual machine in the cluster from which you want to collect metrics. Configure the node-exporter agent as a service to expose internal metrics and processes that are associated with virtual machines.
+
+.Prerequisites
+
+* Install the {product-title} CLI `oc`.
+* Log in to the cluster as a user with `cluster-admin` privileges.
+* Create the `cluster-monitoring-config` `ConfigMap` object in the `openshift-monitoring` project.
+* Configure the `user-workload-monitoring-config` `ConfigMap` object in the `openshift-user-workload-monitoring` project by setting `enableUserWorkload` to `true`.
+
+.Procedure
+
+. Create the `Service` YAML file. In the following example, the file is called `node-exporter-service.yaml`.
++
+[source,yaml]
+----
+kind: Service
+apiVersion: v1
+metadata:
+  name: node-exporter-service <1>
+  namespace: dynamation <2>
+  labels:
+    servicetype: metrics <3>
+spec:
+  ports:
+    - name: exmet <4>
+      protocol: TCP
+      port: 9100 <5>
+      targetPort: 9100 <6>
+  type: ClusterIP
+  selector:
+    monitor: metrics <7>
+----
+<1> The node-exporter service that exposes the metrics from the virtual machines.
+<2> The namespace where the service is created.
+<3> The label for the service. The `ServiceMonitor` uses this label to match this service.
+<4> The name given to the port that exposes metrics on port 9100 for the `ClusterIP` service.
+<5> The target port used by `node-exporter-service` to listen for requests.
+<6> The TCP port number of the virtual machine that is configured with the `monitor` label.
+<7> The label used to match the virtual machine's pods. In this example, any virtual machine's pod with the label `monitor` and a value of `metrics` will be matched.
+
+. Create the node-exporter service:
++
+[source,terminal]
+----
+$ oc create -f node-exporter-service.yaml
+----
diff --git a/modules/virt-configuring-vm-with-node-exporter-service.adoc b/modules/virt-configuring-vm-with-node-exporter-service.adoc
@@ -0,0 +1,76 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events-monitoring/virt-exposing-custom-metrics-for-vms.adoc
+
+:_content-type: PROCEDURE
+[id="virt-configuring-vm-with-node-exporter-service_{context}"]
+= Configuring a virtual machine with the node exporter service
+
+Download the `node-exporter` file on to the virtual machine. Then, create a `systemd` service that runs the node-exporter service when the virtual machine boots.
+
+.Prerequisites
+* The pods for the component are running in the `openshift-user-workload-monitoring` project.
+* Grant the `monitoring-edit` role to users who need to monitor this user-defined project.
+
+.Procedure
+
+. Log on to the virtual machine.
+
+. Download the `node-exporter` file on to the virtual machine by using the directory path that applies to the version of `node-exporter` file.
++
+[source,terminal]
+----
+$ wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
+----
+
+. Extract the executable and place it in the `/usr/bin` directory.
++
+[source,terminal]
+----
+$ sudo tar xvf node_exporter-1.3.1.linux-amd64.tar.gz \
+    --directory /usr/bin --strip 1 "*/node_exporter"
+----
+
+. Create a `node_exporter.service` file in this directory path: `/etc/systemd/system`. This `systemd` service file runs the node-exporter service when the virtual machine reboots.
++
+[source,terminal]
+----
+[Unit]
+Description=Prometheus Metrics Exporter
+After=network.target
+StartLimitIntervalSec=0
+
+[Service]
+Type=simple
+Restart=always
+RestartSec=1
+User=root
+ExecStart=/usr/bin/node_exporter
+
+[Install]
+WantedBy=multi-user.target
+----
+
+. Enable and start the `systemd` service.
++
+[source,terminal]
+----
+$ sudo systemctl enable node_exporter.service
+$ sudo systemctl start node_exporter.service
+----
+
+.Verification
+* Verify that the node-exporter agent is reporting metrics from the virtual machine.
++
+[source,terminal]
+----
+$ curl http://localhost:9100/metrics
+----
++
+.Example output
+[source,terminal]
+----
+go_gc_duration_seconds{quantile="0"} 1.5244e-05
+go_gc_duration_seconds{quantile="0.25"} 3.0449e-05
+go_gc_duration_seconds{quantile="0.5"} 3.7913e-05
+----
diff --git a/modules/virt-creating-custom-monitoring-label-for-vms.adoc b/modules/virt-creating-custom-monitoring-label-for-vms.adoc
@@ -0,0 +1,29 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events-monitoring/virt-exposing-custom-metrics-for-vms.adoc
+
+:_content-type: PROCEDURE
+[id="virt-creating-custom-monitoring-label-for-vms_{context}"]
+= Creating a custom monitoring label for virtual machines
+
+To enable queries to multiple virtual machines from a single service, add a custom label in the virtual machine's YAML file.
+
+.Prerequisites
+
+* Install the {product-title} CLI `oc`.
+* Log in as a user with `cluster-admin` privileges.
+* Access to the web console for stop and restart a virtual machine.
+
+.Procedure
+. Edit the `template` spec of your virtual machine configuration file. In this example, the label `monitor` has the value `metrics`.
++
+[source,yaml]
+----
+spec:
+  template:
+    metadata:
+      labels:
+        monitor: metrics
+----
+
+. Stop and restart the virtual machine to create a new pod with the label name given to the `monitor` label.
diff --git a/modules/virt-creating-servicemonitor-resource-for-node-exporter.adoc b/modules/virt-creating-servicemonitor-resource-for-node-exporter.adoc
@@ -0,0 +1,49 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events-monitoring/virt-exposing-custom-metrics-for-vms.adoc
+
+:_content-type: PROCEDURE
+[id="virt-creating-servicemonitor-resource-for-node-exporter_{context}"]
+= Creating a ServiceMonitor resource for the node exporter service
+
+You can use a Prometheus client library and scrape metrics from the `/metrics` endpoint to access and view the metrics exposed by the node-exporter service. Use a `ServiceMonitor` custom resource definition (CRD) to monitor the node exporter service.
+
+.Prerequisites
+
+* You have access to the cluster as a user with `cluster-admin` privileges or the `monitoring-edit` role.
+* You have enabled monitoring for the user-defined project by configuring the node-exporter service.
+
+.Procedure
+. Create a YAML file for the `ServiceMonitor` resource configuration. In this example, the service monitor matches any service with the label `metrics` and queries the `exmet` port every 30 seconds.
+
++
+[source,yaml]
+----
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  labels:
+    k8s-app: node-exporter-metrics-monitor
+  name: node-exporter-metrics-monitor <1>
+  namespace: dynamation <2>
+spec:
+  endpoints:
+  - interval: 30s <3>
+    port: exmet <4>
+    scheme: http
+  selector:
+    matchLabels:
+      servicetype: metrics
+
+----
+<1> The name of the `ServiceMonitor`.
+<2> The namespace where the `ServiceMonitor` is created.
+<3> The interval at which the port will be queried.
+<4> The name of the port that is queried every 30 seconds
+
+. Create the `ServiceMonitor` configuration for the node-exporter service.
++
+[source,terminal]
+----
+$ oc create -f node-exporter-metrics-monitor.yaml
+----
diff --git a/modules/virt-querying-the-node-exporter-service-for-metrics.adoc b/modules/virt-querying-the-node-exporter-service-for-metrics.adoc
@@ -0,0 +1,80 @@
+// Module included in the following assemblies:
+//
+// * virt/logging_events-monitoring/virt-exposing-custom-metrics-for-vms.adoc
+
+:_content-type: PROCEDURE
+[id="virt-querying-the-node-exporter-service-for-metrics-_{context}"]
+= Querying the node-exporter service for metrics
+
+Metrics are exposed for virtual machines through an HTTP service endpoint under the `/metrics` canonical name. When you query for metrics, Prometheus directly scrapes the metrics from the metrics endpoint exposed by the virtual machines and presents these metrics for viewing.
+
+.Prerequisites
+* You have access to the cluster as a user with `cluster-admin` privileges or the `monitoring-edit` role.
+* You have enabled monitoring for the user-defined project by configuring the node-exporter service.
+
+.Procedure
+. Obtain the HTTP service endpoint by specifying the namespace for the service:
++
+[source,terminal]
+----
+$ oc get service -n <namespace> <node-exporter-service>
+----
+
+. To list all available metrics for the node-exporter service, query the `metrics` resource.
++
+[source,terminal]
+----
+$ curl http://<172.30.226.162:9100>/metrics | grep -vE "^#|^$"
+----
++
+.Example output
+[source,terminal]
+----
+node_arp_entries{device="eth0"} 1
+node_boot_time_seconds 1.643153218e+09
+node_context_switches_total 4.4938158e+07
+node_cooling_device_cur_state{name="0",type="Processor"} 0
+node_cooling_device_max_state{name="0",type="Processor"} 0
+node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
+node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
+node_cpu_seconds_total{cpu="0",mode="idle"} 1.10586485e+06
+node_cpu_seconds_total{cpu="0",mode="iowait"} 37.61
+node_cpu_seconds_total{cpu="0",mode="irq"} 233.91
+node_cpu_seconds_total{cpu="0",mode="nice"} 551.47
+node_cpu_seconds_total{cpu="0",mode="softirq"} 87.3
+node_cpu_seconds_total{cpu="0",mode="steal"} 86.12
+node_cpu_seconds_total{cpu="0",mode="system"} 464.15
+node_cpu_seconds_total{cpu="0",mode="user"} 1075.2
+node_disk_discard_time_seconds_total{device="vda"} 0
+node_disk_discard_time_seconds_total{device="vdb"} 0
+node_disk_discarded_sectors_total{device="vda"} 0
+node_disk_discarded_sectors_total{device="vdb"} 0
+node_disk_discards_completed_total{device="vda"} 0
+node_disk_discards_completed_total{device="vdb"} 0
+node_disk_discards_merged_total{device="vda"} 0
+node_disk_discards_merged_total{device="vdb"} 0
+node_disk_info{device="vda",major="252",minor="0"} 1
+node_disk_info{device="vdb",major="252",minor="16"} 1
+node_disk_io_now{device="vda"} 0
+node_disk_io_now{device="vdb"} 0
+node_disk_io_time_seconds_total{device="vda"} 174
+node_disk_io_time_seconds_total{device="vdb"} 0.054
+node_disk_io_time_weighted_seconds_total{device="vda"} 259.79200000000003
+node_disk_io_time_weighted_seconds_total{device="vdb"} 0.039
+node_disk_read_bytes_total{device="vda"} 3.71867136e+08
+node_disk_read_bytes_total{device="vdb"} 366592
+node_disk_read_time_seconds_total{device="vda"} 19.128
+node_disk_read_time_seconds_total{device="vdb"} 0.039
+node_disk_reads_completed_total{device="vda"} 5619
+node_disk_reads_completed_total{device="vdb"} 96
+node_disk_reads_merged_total{device="vda"} 5
+node_disk_reads_merged_total{device="vdb"} 0
+node_disk_write_time_seconds_total{device="vda"} 240.66400000000002
+node_disk_write_time_seconds_total{device="vdb"} 0
+node_disk_writes_completed_total{device="vda"} 71584
+node_disk_writes_completed_total{device="vdb"} 0
+node_disk_writes_merged_total{device="vda"} 19761
+node_disk_writes_merged_total{device="vdb"} 0
+node_disk_written_bytes_total{device="vda"} 2.007924224e+09
+node_disk_written_bytes_total{device="vdb"} 0
+----
diff --git a/virt/logging_events_monitoring/virt-exposing-custom-metrics-for-vms.adoc b/virt/logging_events_monitoring/virt-exposing-custom-metrics-for-vms.adoc
@@ -0,0 +1,35 @@
+:_content-type: ASSEMBLY
+[id="virt-exposing-custom-metrics-for-vms"]
+= Exposing custom metrics for virtual machines
+include::modules/virt-document-attributes.adoc[]
+:context: virt-exposing-custom-metrics-for-vms
+
+toc::[]
+
+{product-title} includes a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. This monitoring stack is based on the Prometheus monitoring system. Prometheus is a time-series database and a rule evaluation engine for metrics.
+
+In addition to using the {product-title} monitoring stack, you can enable monitoring for user-defined projects by using the CLI and query custom metrics that are exposed for virtual machines through the `node-exporter` service.
+
+include::modules/virt-configuring-node-exporter-service.adoc[leveloffset=+1]
+include::modules/virt-configuring-vm-with-node-exporter-service.adoc[leveloffset=+1]
+include::modules/virt-creating-custom-monitoring-label-for-vms.adoc[leveloffset=+1]
+include::modules/virt-querying-the-node-exporter-service-for-metrics.adoc[leveloffset=+2]
+include::modules/virt-creating-servicemonitor-resource-for-node-exporter.adoc[leveloffset=+1]
+include::modules/virt-accessing-node-exporter-outside-cluster.adoc[leveloffset=+2]
+
+[role="_additional-resources"]
+[id="additional-resources_virt-exposing-custom-metrics-for-vms"]
+== Additional resources
+* xref:../../monitoring/configuring-the-monitoring-stack.adoc#configuring-the-monitoring-stack[Configuring the monitoring stack]
+
+* xref:../../monitoring/enabling-monitoring-for-user-defined-projects.adoc#enabling-monitoring-for-user-defined-projects[Enabling monitoring for user-defined projects]
+
+* xref:../../monitoring/managing-metrics.adoc#managing-metrics[Managing metrics]
+
+* xref:../../monitoring/reviewing-monitoring-dashboards.adoc#reviewing-monitoring-dashboards[Reviewing monitoring dashboards]
+
+* xref:../../applications/application-health.adoc#application-health[Monitoring application health by using health checks]
+
+* xref:../../nodes/pods/nodes-pods-configmaps.adoc#nodes-pods-configmaps[Creating and using config maps]
+
+* xref:../../virt/virtual_machines/virt-controlling-vm-states.adoc#virt-controlling-vm-states[Controlling virtual machine states]