Skip to content

Commit 4dcfe81

Browse files
authored
Merge pull request #42126 from ousleyp/cnv-16295
CNV-16295: Prevent NVIDIA GPU operands from deploying on nodes
2 parents fb0fcf8 + a19273f commit 4dcfe81

File tree

2 files changed

+96
-2
lines changed

2 files changed

+96
-2
lines changed
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
// Module included in the following assembly:
2+
//
3+
// * virt/virtual_machines/advanced_vm_management/virt-configuring-pci-passthrough.adoc
4+
//
5+
6+
:_content-type: PROCEDURE
7+
[id="virt-preventing-nvidia-operands-from-deploying-on-nodes_{context}"]
8+
= Preventing NVIDIA GPU operands from deploying on nodes
9+
10+
If you use the link:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/openshift/contents.html[NVIDIA GPU Operator] in your cluster, you can apply the `nvidia.com/gpu.deploy.operands=false` label to nodes that you do not want to configure for GPU or vGPU operands. This label prevents the creation of the pods that configure GPU or vGPU operands and terminates the pods if they already exist.
11+
12+
ifdef::openshift-enterprise[]
13+
:FeatureName: Using the NVIDIA GPU Operator with {VirtProductName}
14+
include::snippets/technology-preview.adoc[]
15+
endif::[]
16+
17+
.Prerequisites
18+
19+
* The OpenShift CLI (`oc`) is installed.
20+
21+
.Procedure
22+
23+
* Label the node by running the following command:
24+
+
25+
[source,terminal]
26+
----
27+
$ oc label node <node_name> nvidia.com/gpu.deploy.operands=false <1>
28+
----
29+
<1> Replace `<node_name>` with the name of a node where you do not want to install the NVIDIA GPU operands.
30+
31+
.Verification
32+
33+
. Verify that the label was added to the node by running the following command:
34+
+
35+
[source,terminal]
36+
----
37+
$ oc describe node <node_name>
38+
----
39+
40+
. Optional: If GPU operands were previously deployed on the node, verify their removal.
41+
42+
.. Check the status of the pods in the `nvidia-gpu-operator` namespace by running the following command:
43+
+
44+
[source,terminal]
45+
----
46+
$ oc get pods -n nvidia-gpu-operator
47+
----
48+
+
49+
.Example output
50+
51+
[source,terminal]
52+
----
53+
NAME READY STATUS RESTARTS AGE
54+
gpu-operator-59469b8c5c-hw9wj 1/1 Running 0 8d
55+
nvidia-sandbox-validator-7hx98 1/1 Running 0 8d
56+
nvidia-sandbox-validator-hdb7p 1/1 Running 0 8d
57+
nvidia-sandbox-validator-kxwj7 1/1 Terminating 0 9d
58+
nvidia-vfio-manager-7w9fs 1/1 Running 0 8d
59+
nvidia-vfio-manager-866pz 1/1 Running 0 8d
60+
nvidia-vfio-manager-zqtck 1/1 Terminating 0 9d
61+
----
62+
63+
.. Monitor the pod status until the pods with `Terminating` status are removed:
64+
+
65+
[source,terminal]
66+
----
67+
$ oc get pods -n nvidia-gpu-operator
68+
----
69+
+
70+
.Example output
71+
72+
[source,terminal]
73+
----
74+
NAME READY STATUS RESTARTS AGE
75+
gpu-operator-59469b8c5c-hw9wj 1/1 Running 0 8d
76+
nvidia-sandbox-validator-7hx98 1/1 Running 0 8d
77+
nvidia-sandbox-validator-hdb7p 1/1 Running 0 8d
78+
nvidia-vfio-manager-7w9fs 1/1 Running 0 8d
79+
nvidia-vfio-manager-866pz 1/1 Running 0 8d
80+
----

virt/virtual_machines/advanced_vm_management/virt-configuring-pci-passthrough.adoc

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,28 @@ toc::[]
1313
//When this feature is available in the web console, please
1414
//add the new content to this assembly.
1515

16-
The Peripheral Component Interconnect (PCI) passthrough feature enables you to access and manage hardware devices from a virtual machine. When PCI passthrough is configured, the PCI devices function as if they were physically attached to the guest operating system.
16+
The Peripheral Component Interconnect (PCI) passthrough feature enables you to access and manage hardware devices from a virtual machine (VM). When PCI passthrough is configured, the PCI devices function as if they were physically attached to the guest operating system.
1717

1818
Cluster administrators can expose and manage host devices that are permitted to be used in the cluster by using the `oc` command-line interface (CLI).
1919

20-
include::modules/virt-about-pci-passthrough.adoc[leveloffset=+1]
20+
[id="virt-preparing-nodes-for-gpu-passthrough"]
21+
== Preparing nodes for GPU passthrough
22+
23+
You can prevent GPU operands from deploying on worker nodes that you designated for GPU passthrough.
24+
25+
include::modules/virt-preventing-nvidia-gpu-operands-from-deploying-on-nodes.adoc[leveloffset=+2]
26+
27+
[id="virt-preparing-host-devices-for-pci-passthrough"]
28+
== Preparing host devices for PCI passthrough
29+
30+
include::modules/virt-about-pci-passthrough.adoc[leveloffset=+2]
31+
2132
include::modules/virt-adding-kernel-arguments-enable-iommu.adoc[leveloffset=+2]
33+
2234
include::modules/virt-binding-devices-vfio-driver.adoc[leveloffset=+2]
35+
2336
include::modules/virt-exposing-pci-device-in-cluster-cli.adoc[leveloffset=+2]
37+
2438
include::modules/virt-removing-pci-device-from-cluster-cli.adoc[leveloffset=+2]
2539

2640
[id="virt-configuring-vms-for-pci-passthrough"]

0 commit comments

Comments
 (0)