Skip to content

Commit 8c4e6bd

Browse files
authored
Merge pull request #74365 from sjhala-ccs/cnv-13710
CNV-13710: Configuring cluster for real-time workloads
2 parents 534d0ab + e4e00f0 commit 8c4e6bd

File tree

4 files changed

+267
-0
lines changed

4 files changed

+267
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4084,6 +4084,8 @@ Topics:
40844084
File: virt-vm-control-plane-tuning
40854085
- Name: Assigning compute resources
40864086
File: virt-assigning-compute-resources
4087+
- Name: Running real-time workloads
4088+
File: virt-configuring-cluster-realtime-workloads
40874089
- Name: VM disks
40884090
Dir: virtual_disks
40894091
Topics:
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * virt/virtual_machines/advanced_vm_management/virt-configuring-cluster-realtime-workloads.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="virt-configuring-cluster-real-time_{context}"]
7+
= Configuring a cluster for real-time workloads
8+
9+
You can configure an {product-title} cluster to run real-time workloads.
10+
11+
.Prerequisites
12+
* You have access to the cluster as a user with `cluster-admin` permissions.
13+
* You have installed the OpenShift CLI (`oc`).
14+
* You have installed the Node Tuning Operator.
15+
16+
.Procedure
17+
18+
. Label a subset of the compute nodes with a custom role, for example, `worker-realtime`:
19+
+
20+
[source,terminal]
21+
----
22+
$ oc label node <node_name> node-role.kubernetes.io/worker-realtime=""
23+
----
24+
+
25+
[NOTE]
26+
====
27+
You must use the default `master` role for {sno} and compact clusters.
28+
====
29+
30+
. Create a new `MachineConfigPool` manifest that contains the `worker-realtime` label in the `spec.machineConfigSelector` object:
31+
+
32+
.Example `MachineConfigPool` manifest
33+
[source,yaml]
34+
----
35+
apiVersion: machineconfiguration.openshift.io/v1
36+
kind: MachineConfigPool
37+
metadata:
38+
name: worker-realtime
39+
labels:
40+
machineconfiguration.openshift.io/role: worker-realtime
41+
spec:
42+
machineConfigSelector:
43+
matchExpressions:
44+
- key: machineconfiguration.openshift.io/role
45+
operator: In
46+
values:
47+
- worker
48+
- worker-realtime
49+
nodeSelector:
50+
matchLabels:
51+
node-role.kubernetes.io/worker-realtime: ""
52+
----
53+
+
54+
[NOTE]
55+
====
56+
You do not need to create a new `MachineConfigPool` manifest for {sno} and compact clusters.
57+
====
58+
59+
. If you created a new `MachineConfigPool` manifest in step 2, apply it to the cluster by using the following command:
60+
+
61+
[source,terminal]
62+
----
63+
$ oc apply -f <real_time_mcp>.yaml
64+
----
65+
66+
. Create a `PerformanceProfile` manifest that applies to the labeled nodes and the machine config pool that you created in the previous steps:
67+
+
68+
.Example `PerformanceProfile` manifest
69+
[source,yaml]
70+
----
71+
apiVersion: performance.openshift.io/v2
72+
kind: PerformanceProfile
73+
metadata:
74+
name: profile-1
75+
spec:
76+
cpu:
77+
isolated: 4-39,44-79
78+
reserved: 0-3,40-43
79+
globallyDisableIrqLoadBalancing: true
80+
hugepages:
81+
defaultHugepagesSize: 1G
82+
pages:
83+
- count: 8
84+
size: 1G
85+
realTimeKernel:
86+
enabled: true
87+
workloadHints:
88+
highPowerConsumption: true
89+
realTime: true
90+
nodeSelector:
91+
node-role.kubernetes.io/worker-realtime: ""
92+
numa:
93+
topologyPolicy: single-numa-node
94+
----
95+
96+
. Apply the `PerformanceProfile` manifest:
97+
+
98+
[source,terminal]
99+
----
100+
$ oc apply -f <real_time_pp>.yaml
101+
----
102+
+
103+
[NOTE]
104+
====
105+
The compute nodes automatically reboot twice after you apply the `MachineConfigPool` and `PerformanceProfile` manifests. This process might take a long time.
106+
====
107+
108+
. Retrieve the name of the generated `RuntimeClass` resource from the `status.runtimeClass` field of the `PerformanceProfile` object:
109+
+
110+
[source,terminal]
111+
----
112+
$ oc get performanceprofiles.performance.openshift.io profile-1 -o=jsonpath='{.status.runtimeClass}{"\n"}'
113+
----
114+
115+
. Set the previously obtained `RuntimeClass` name as the default container runtime class for the `virt-launcher` pods by editing the `HyperConverged` custom resource (CR):
116+
+
117+
[source,terminal,subs="attributes+"]
118+
----
119+
$ oc patch hyperconverged kubevirt-hyperconverged -n {CNVNamespace} \
120+
--type='json' -p='[{"op": "add", "path": "/spec/defaultRuntimeClass", "value":"<runtimeclass_name>"}]'
121+
----
122+
+
123+
[NOTE]
124+
====
125+
Editing the `HyperConverged` CR changes a global setting that affects all VMs that are created after the change is applied.
126+
====
127+
128+
. If your real-time-enabled compute nodes use simultaneous multithreading (SMT), enable the `alignCPUs` feature gate by editing the `HyperConverged` CR:
129+
+
130+
[source,terminal,subs="attributes+"]
131+
----
132+
$ oc patch hyperconverged kubevirt-hyperconverged -n {CNVNamespace} \
133+
--type='json' -p='[{"op": "replace", "path": "/spec/featureGates/alignCPUs", "value": true}]'
134+
----
135+
+
136+
[NOTE]
137+
====
138+
Enabling `alignCPUs` allows {VirtProductName} to request up to two additional dedicated CPUs to bring the total CPU count to an even parity when using
139+
emulator thread isolation.
140+
====
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * virt/virtual_machines/advanced_vm_management/virt-configuring-cluster-realtime-workloads.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="virt-configuring-vm-real-time_{context}"]
7+
= Configuring a virtual machine for real-time workloads
8+
9+
You can configure a virtual machines (VM) to run real-time workloads.
10+
11+
.Prerequisites
12+
* Your cluster is configured to run real-time workloads.
13+
* You have installed the `virtctl` tool.
14+
15+
.Procedure
16+
. Create a `VirtualMachine` manifest to include information about CPU topology, CRI-O annotations, and huge pages:
17+
+
18+
.Example `VirtualMachine` manifest
19+
[source,yaml]
20+
----
21+
apiVersion: kubevirt.io/v1
22+
kind: VirtualMachine
23+
metadata:
24+
name: realtime-vm
25+
spec:
26+
template:
27+
metadata:
28+
annotations:
29+
cpu-load-balancing.crio.io: disable <1>
30+
cpu-quota.crio.io: disable <2>
31+
irq-load-balancing.crio.io: disable <3>
32+
spec:
33+
domain:
34+
cpu:
35+
dedicatedCpuPlacement: true
36+
isolateEmulatorThread: true
37+
model: host-passthrough
38+
numa:
39+
guestMappingPassthrough: {}
40+
realtime: {}
41+
sockets: 1 <4>
42+
cores: 4 <5>
43+
threads: 1
44+
devices:
45+
autoattachGraphicsDevice: false
46+
autoattachMemBalloon: false
47+
autoattachSerialConsole: true
48+
ioThreadsPolicy: auto
49+
memory:
50+
guest: 4Gi
51+
hugepages:
52+
pageSize: 1Gi <6>
53+
terminationGracePeriodSeconds: 0
54+
# ...
55+
----
56+
<1> This annotation specifies that load balancing is disabled for CPUs that are used by the container.
57+
<2> This annotation specifies that the CPU quota is disabled for CPUs that are used by the container.
58+
<3> This annotation specifies that interrupt request (IRQ) load balancing is disabled for CPUs that are used by the container.
59+
<4> The number of sockets inside the VM.
60+
<5> The number of cores inside the VM. This must be a value greater than or equal to `1`.
61+
<6> The size of the huge pages. The possible values for x86-64 architectures are `1Gi` and `2Mi`. In this example, the request is for 4 huge pages of size 1 Gi.
62+
63+
. Apply the `VirtualMachine` manifest:
64+
+
65+
[source,terminal]
66+
----
67+
$ oc apply -f <file_name>.yaml
68+
----
69+
70+
. Configure the guest operating system. The following example shows the configuration steps for a {op-system-base} 8 operating system:
71+
.. Run the following command to connect to the VM console:
72+
+
73+
[source,terminal]
74+
----
75+
$ virtctl console <vm_name>
76+
----
77+
78+
.. Configure huge pages by using the GRUB boot loader command-line interface. In the following example, 8 1G huge pages are specified.
79+
+
80+
[source,terminal]
81+
----
82+
$ grubby --update-kernel=ALL --args="default_hugepagesz=1GB hugepagesz=1G hugepages=8"
83+
----
84+
85+
.. To achieve low-latency tuning by using the `cpu-partitioning` profile in the TuneD application, run the following commands:
86+
+
87+
[source,terminal]
88+
----
89+
$ dnf install -y tuned-profiles-cpu-partitioning
90+
----
91+
+
92+
[source,terminal]
93+
----
94+
$ echo isolated_cores=2-9 > /etc/tuned/cpu-partitioning-variables.conf
95+
----
96+
The first two CPUs (0 and 1) are set aside for house keeping tasks and the rest are isolated for the real-time application.
97+
+
98+
[source,terminal]
99+
----
100+
$ tuned-adm profile cpu-partitioning
101+
----
102+
103+
. Restart the VM to apply the changes.
104+
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="virt-configuring-cluster-realtime-workloads"]
3+
= Running real-time workloads
4+
include::_attributes/common-attributes.adoc[]
5+
:context: virt-configuring-cluster-realtime-workloads
6+
7+
toc::[]
8+
9+
You can configure an {product-title} cluster to run real-time virtual machine (VM) workloads that require low and predictable latency. {product-title} provides the Node Tuning Operator to implement automatic tuning for real-time and low latency workloads.
10+
11+
include::modules/virt-configuring-cluster-real-time.adoc[leveloffset=+1]
12+
13+
include::modules/virt-configuring-vm-real-time.adoc[leveloffset=+1]
14+
15+
[role="_additional-resources"]
16+
[id="additional-resources_configuring-cluster-real-time"]
17+
== Additional resources
18+
19+
* xref:../../../scalability_and_performance/using-node-tuning-operator.adoc#using-node-tuning-operator[Using the Node Tuning Operator]
20+
* xref:../../../scalability_and_performance/cnf-low-latency-tuning.adoc#node-tuning-operator-known-limitations-for-real-time_cnf-master[Known limitations for real-time]
21+
* xref:../../../scalability_and_performance/cnf-low-latency-tuning.adoc#reducing-nic-queues-using-the-node-tuning-operator_cnf-master[Reducing NIC queues using the Node Tuning Operator]

0 commit comments

Comments
 (0)