Merge pull request #74365 from sjhala-ccs/cnv-13710

sjhala-ccs · web-flow · commit 8c4e6bd99b3c · 2024-05-06T15:59:31.000-04:00
CNV-13710: Configuring cluster for real-time workloads
diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml
@@ -4084,6 +4084,8 @@ Topics:
       File: virt-vm-control-plane-tuning
     - Name: Assigning compute resources
       File: virt-assigning-compute-resources
+    - Name: Running real-time workloads
+      File: virt-configuring-cluster-realtime-workloads
   - Name: VM disks
     Dir: virtual_disks
     Topics:
diff --git a/modules/virt-configuring-cluster-real-time.adoc b/modules/virt-configuring-cluster-real-time.adoc
@@ -0,0 +1,140 @@
+// Module included in the following assemblies:
+//
+// * virt/virtual_machines/advanced_vm_management/virt-configuring-cluster-realtime-workloads.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="virt-configuring-cluster-real-time_{context}"]
+= Configuring a cluster for real-time workloads
+
+You can configure an {product-title} cluster to run real-time workloads.
+
+.Prerequisites
+* You have access to the cluster as a user with `cluster-admin` permissions.
+* You have installed the OpenShift CLI (`oc`).
+* You have installed the Node Tuning Operator.
+
+.Procedure
+
+. Label a subset of the compute nodes with a custom role, for example, `worker-realtime`:
++
+[source,terminal]
+----
+$ oc label node <node_name> node-role.kubernetes.io/worker-realtime=""
+----
++
+[NOTE]
+====
+You must use the default `master` role for {sno} and compact clusters.
+====
+
+. Create a new `MachineConfigPool` manifest that contains the `worker-realtime` label in the `spec.machineConfigSelector` object:
++
+.Example `MachineConfigPool` manifest
+[source,yaml]
+----
+apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfigPool
+metadata:
+  name: worker-realtime
+  labels:
+    machineconfiguration.openshift.io/role: worker-realtime
+spec:
+  machineConfigSelector:
+    matchExpressions:
+      - key: machineconfiguration.openshift.io/role
+        operator: In
+        values:
+          - worker
+          - worker-realtime
+  nodeSelector:
+    matchLabels:
+      node-role.kubernetes.io/worker-realtime: "" 
+----
++
+[NOTE]
+====
+You do not need to create a new `MachineConfigPool` manifest for {sno} and compact clusters.
+====
+
+. If you created a new `MachineConfigPool` manifest in step 2, apply it to the cluster by using the following command:
++
+[source,terminal]
+----
+$ oc apply -f <real_time_mcp>.yaml
+----
+
+. Create a `PerformanceProfile` manifest that applies to the labeled nodes and the machine config pool that you created in the previous steps:
++
+.Example `PerformanceProfile` manifest
+[source,yaml]
+----
+apiVersion: performance.openshift.io/v2
+kind: PerformanceProfile
+metadata:
+  name: profile-1
+spec:
+  cpu:
+    isolated: 4-39,44-79
+    reserved: 0-3,40-43
+  globallyDisableIrqLoadBalancing: true
+  hugepages:
+    defaultHugepagesSize: 1G
+    pages:
+    - count: 8
+      size: 1G
+  realTimeKernel:
+    enabled: true
+  workloadHints:
+    highPowerConsumption: true
+    realTime: true
+  nodeSelector:
+    node-role.kubernetes.io/worker-realtime: ""
+  numa:
+    topologyPolicy: single-numa-node
+----
+
+. Apply the `PerformanceProfile` manifest:
++
+[source,terminal]
+----
+$ oc apply -f <real_time_pp>.yaml
+----
++
+[NOTE]
+====
+The compute nodes automatically reboot twice after you apply the `MachineConfigPool` and `PerformanceProfile` manifests. This process might take a long time.
+====
+
+. Retrieve the name of the generated `RuntimeClass` resource from the `status.runtimeClass` field of the `PerformanceProfile` object:
++
+[source,terminal]
+----
+$ oc get performanceprofiles.performance.openshift.io profile-1 -o=jsonpath='{.status.runtimeClass}{"\n"}'
+----
+
+. Set the previously obtained `RuntimeClass` name as the default container runtime class for the `virt-launcher` pods by editing the `HyperConverged` custom resource (CR):
++
+[source,terminal,subs="attributes+"]
+----
+$ oc patch hyperconverged kubevirt-hyperconverged -n {CNVNamespace} \
+    --type='json' -p='[{"op": "add", "path": "/spec/defaultRuntimeClass", "value":"<runtimeclass_name>"}]'
+----
++
+[NOTE]
+====
+Editing the `HyperConverged` CR changes a global setting that affects all VMs that are created after the change is applied.
+====
+
+. If your real-time-enabled compute nodes use simultaneous multithreading (SMT), enable the `alignCPUs` feature gate by editing the `HyperConverged` CR:
++
+[source,terminal,subs="attributes+"]
+----
+$ oc patch hyperconverged kubevirt-hyperconverged -n {CNVNamespace} \
+    --type='json' -p='[{"op": "replace", "path": "/spec/featureGates/alignCPUs", "value": true}]'
+----
++
+[NOTE]
+====
+Enabling `alignCPUs` allows {VirtProductName} to request up to two additional dedicated CPUs to bring the total CPU count to an even parity when using
+emulator thread isolation.
+====
diff --git a/modules/virt-configuring-vm-real-time.adoc b/modules/virt-configuring-vm-real-time.adoc
@@ -0,0 +1,104 @@
+// Module included in the following assemblies:
+//
+// * virt/virtual_machines/advanced_vm_management/virt-configuring-cluster-realtime-workloads.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="virt-configuring-vm-real-time_{context}"]
+= Configuring a virtual machine for real-time workloads
+
+You can configure a virtual machines (VM) to run real-time workloads.
+
+.Prerequisites
+* Your cluster is configured to run real-time workloads.
+* You have installed the `virtctl` tool.
+
+.Procedure
+. Create a `VirtualMachine` manifest to include information about CPU topology, CRI-O annotations, and huge pages:
++
+.Example `VirtualMachine` manifest
+[source,yaml]
+----
+apiVersion: kubevirt.io/v1
+kind: VirtualMachine
+metadata:
+  name: realtime-vm
+spec:
+  template:
+    metadata:
+      annotations:
+        cpu-load-balancing.crio.io: disable <1>
+        cpu-quota.crio.io: disable <2>
+        irq-load-balancing.crio.io: disable <3>
+    spec:
+      domain:
+        cpu:
+          dedicatedCpuPlacement: true
+          isolateEmulatorThread: true
+          model: host-passthrough
+          numa:
+            guestMappingPassthrough: {}
+          realtime: {}
+          sockets: 1 <4>
+          cores: 4 <5>
+          threads: 1
+        devices:
+          autoattachGraphicsDevice: false
+          autoattachMemBalloon: false
+          autoattachSerialConsole: true
+        ioThreadsPolicy: auto
+        memory:
+          guest: 4Gi
+          hugepages:
+            pageSize: 1Gi <6>
+        terminationGracePeriodSeconds: 0
+# ...
+----
+<1> This annotation specifies that load balancing is disabled for CPUs that are used by the container.
+<2> This annotation specifies that the CPU quota is disabled for CPUs that are used by the container.
+<3> This annotation specifies that interrupt request (IRQ) load balancing is disabled for CPUs that are used by the container.
+<4> The number of sockets inside the VM. 
+<5> The number of cores inside the VM. This must be a value greater than or equal to `1`.
+<6> The size of the huge pages. The possible values for x86-64 architectures are `1Gi` and `2Mi`. In this example, the request is for 4 huge pages of size 1 Gi.
+
+. Apply the `VirtualMachine` manifest:
++
+[source,terminal]
+----
+$ oc apply -f <file_name>.yaml
+----
+
+. Configure the guest operating system. The following example shows the configuration steps for a {op-system-base} 8 operating system:
+.. Run the following command to connect to the VM console:
++
+[source,terminal]
+----
+$ virtctl console <vm_name>
+----
+
+.. Configure huge pages by using the GRUB boot loader command-line interface. In the following example, 8 1G huge pages are specified.
++
+[source,terminal]
+----
+$ grubby --update-kernel=ALL --args="default_hugepagesz=1GB hugepagesz=1G hugepages=8"
+----
+
+.. To achieve low-latency tuning by using the `cpu-partitioning` profile in the TuneD application, run the following commands:
++
+[source,terminal]
+----
+$ dnf install -y tuned-profiles-cpu-partitioning
+----
++
+[source,terminal]
+----
+$ echo isolated_cores=2-9 > /etc/tuned/cpu-partitioning-variables.conf
+----
+The first two CPUs (0 and 1) are set aside for house keeping tasks and the rest are isolated for the real-time application.
++
+[source,terminal]
+----
+$ tuned-adm profile cpu-partitioning
+----
+
+. Restart the VM to apply the changes.
+
diff --git a/virt/virtual_machines/advanced_vm_management/virt-configuring-cluster-realtime-workloads.adoc b/virt/virtual_machines/advanced_vm_management/virt-configuring-cluster-realtime-workloads.adoc
@@ -0,0 +1,21 @@
+:_mod-docs-content-type: ASSEMBLY                                        
+[id="virt-configuring-cluster-realtime-workloads"]                            
+= Running real-time workloads                                              
+include::_attributes/common-attributes.adoc[]                  
+:context: virt-configuring-cluster-realtime-workloads                        
+                                                        
+toc::[]   
+ 
+You can configure an {product-title} cluster to run real-time virtual machine (VM) workloads that require low and predictable latency. {product-title} provides the Node Tuning Operator to implement automatic tuning for real-time and low latency workloads.
+
+include::modules/virt-configuring-cluster-real-time.adoc[leveloffset=+1]
+
+include::modules/virt-configuring-vm-real-time.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+[id="additional-resources_configuring-cluster-real-time"]
+== Additional resources
+
+* xref:../../../scalability_and_performance/using-node-tuning-operator.adoc#using-node-tuning-operator[Using the Node Tuning Operator]
+* xref:../../../scalability_and_performance/cnf-low-latency-tuning.adoc#node-tuning-operator-known-limitations-for-real-time_cnf-master[Known limitations for real-time]
+* xref:../../../scalability_and_performance/cnf-low-latency-tuning.adoc#reducing-nic-queues-using-the-node-tuning-operator_cnf-master[Reducing NIC queues using the Node Tuning Operator]