|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// * virt/virtual_machines/advanced_vm_management/virt-configuring-mediated-devices.adoc |
| 4 | + |
| 5 | +:_content-type: CONCEPT |
| 6 | +[id="virt-about-using-virtual-gpus_{context}"] |
| 7 | += About using virtual GPUs with {VirtProductName} |
| 8 | + |
| 9 | +Some graphics processing unit (GPU) cards support the creation of virtual GPUs (vGPUs). {VirtProductName} can automatically create vGPUs and other mediated devices if an administrator provides configuration details in the `HyperConverged` custom resource (CR). This automation is especially useful for large clusters. |
| 10 | + |
| 11 | +[NOTE] |
| 12 | +==== |
| 13 | +Refer to your hardware vendor's documentation for functionality and support details. |
| 14 | +==== |
| 15 | + |
| 16 | +Mediated device:: A physical device that is divided into one or more virtual devices. A vGPU is a type of mediated device (mdev); the performance of the physical GPU is divided among the virtual devices. You can assign mediated devices to one or more virtual machines (VMs), but the number of guests must be compatible with your GPU. Some GPUs do not support multiple guests. |
| 17 | + |
| 18 | +[id="configuration-overview_{context}"] |
| 19 | +== Configuration overview |
| 20 | + |
| 21 | +When configuring mediated devices, an administrator must: |
| 22 | + |
| 23 | +* Create the mediated devices. |
| 24 | +* Expose the mediated devices to the cluster. |
| 25 | + |
| 26 | +The `HyperConverged` CR includes APIs that accomplish both tasks: |
| 27 | + |
| 28 | +.Creating mediated devices |
| 29 | + |
| 30 | +[source,yaml] |
| 31 | +---- |
| 32 | +... |
| 33 | +spec: |
| 34 | + mediatedDevicesConfiguration: |
| 35 | + mediatedDevicesTypes: <.> |
| 36 | + - <device_type> |
| 37 | + nodeMediatedDeviceTypes: <.> |
| 38 | + - mediatedDevicesTypes: <.> |
| 39 | + - <device_type> |
| 40 | + nodeSelector: <.> |
| 41 | + <node_selector_key>: <node_selector_value> |
| 42 | +... |
| 43 | +---- |
| 44 | +<.> Required: Configures global settings for the cluster. |
| 45 | +<.> Optional: Overrides the global configuration for a specific node or group of nodes. Must be used with the global `mediatedDevicesTypes` configuration. |
| 46 | +<.> Required if you use `nodeMediatedDeviceTypes`. Overrides the global `mediatedDevicesTypes` configuration for select nodes. |
| 47 | +<.> Required if you use `nodeMediatedDeviceTypes`. Must include a `key:value` pair. |
| 48 | + |
| 49 | +.Exposing mediated devices to the cluster |
| 50 | + |
| 51 | +[source,yaml] |
| 52 | +---- |
| 53 | +... |
| 54 | + permittedHostDevices: |
| 55 | + mediatedDevices: |
| 56 | + - mdevNameSelector: GRID T4-2Q <.> |
| 57 | + resourceName: nvidia.com/GRID_T4-2Q |
| 58 | +... |
| 59 | +---- |
| 60 | +<.> Exposes the mediated devices that map to this value on the host. |
| 61 | ++ |
| 62 | +[NOTE] |
| 63 | +==== |
| 64 | +You can see the mediated device types that your device supports by viewing the contents of `/sys/bus/pci/devices/<slot>:<bus>:<domain>.<function>/mdev_supported_types/<type>/name`, substituting the correct values for your system. |
| 65 | +
|
| 66 | +For example, the name file for the `nvidia-231` type contains the selector string `GRID T4-2Q`. Using `GRID T4-2Q` as the `mdevNameSelector` value allows nodes to use the `nvidia-231` type. |
| 67 | +==== |
| 68 | + |
| 69 | +[id="how-vgpus-are-assigned-to-nodes_{context}"] |
| 70 | +== How vGPUs are assigned to nodes |
| 71 | + |
| 72 | +For each physical device, {VirtProductName} configures: |
| 73 | + |
| 74 | +* A single mdev type. |
| 75 | +* The maximum number of instances of the selected mdev type. |
| 76 | + |
| 77 | +The cluster architecture affects how devices are created and assigned to nodes. |
| 78 | + |
| 79 | +Large cluster with multiple cards per node:: On nodes with multiple cards that can support similar vGPU types, the relevant device types are created in a round-robin manner. |
| 80 | +For example: |
| 81 | ++ |
| 82 | +[source,yaml] |
| 83 | +---- |
| 84 | +... |
| 85 | +mediatedDevicesConfiguration: |
| 86 | + mediatedDevicesTypes: |
| 87 | + - nvidia-222 |
| 88 | + - nvidia-228 |
| 89 | + - nvidia-105 |
| 90 | + - nvidia-108 |
| 91 | +... |
| 92 | +---- |
| 93 | ++ |
| 94 | +In this scenario, each node has two cards, both of which support the following vGPU types: |
| 95 | ++ |
| 96 | +[source,text] |
| 97 | +---- |
| 98 | +nvidia-105 |
| 99 | +... |
| 100 | +nvidia-108 |
| 101 | +nvidia-217 |
| 102 | +nvidia-299 |
| 103 | +... |
| 104 | +---- |
| 105 | ++ |
| 106 | +On each node, {VirtProductName} creates: |
| 107 | + |
| 108 | +* 16 vGPUs of type nvidia-105 on the first card. |
| 109 | +* 2 vGPUs of type nvidia-108 on the second card. |
| 110 | + |
| 111 | +One node has a single card that supports more than one requested vGPU type:: {VirtProductName} uses the supported type that comes first on the `mediatedDevicesTypes` list. |
| 112 | ++ |
| 113 | +For example, a node's card supports `nvidia-223` and `nvidia-224`. The following `mediatedDevicesTypes` list is configured: |
| 114 | ++ |
| 115 | +[source,yaml] |
| 116 | +---- |
| 117 | +... |
| 118 | +mediatedDevicesConfiguration: |
| 119 | + mediatedDevicesTypes: |
| 120 | + - nvidia-22 |
| 121 | + - nvidia-223 |
| 122 | + - nvidia-224 |
| 123 | +... |
| 124 | +---- |
| 125 | ++ |
| 126 | +In this example, {VirtProductName} uses the `nvidia-223` type. |
| 127 | + |
| 128 | +[id="about-changing-removing-mediated-devices_{context}"] |
| 129 | +== About changing and removing mediated devices |
| 130 | + |
| 131 | +{VirtProductName} updates the cluster's mediated device configuration if: |
| 132 | + |
| 133 | +* You edit the `HyperConverged` CR and change the contents of the `mediatedDevicesTypes` stanza. |
| 134 | + |
| 135 | +* You change the node labels that match the `nodeMediatedDeviceTypes` node selector. |
| 136 | + |
| 137 | +* You remove the device information from the `spec.mediatedDevicesConfiguration` and `spec.permittedHostDevices` stanzas of the `HyperConverged` CR. |
| 138 | ++ |
| 139 | +[NOTE] |
| 140 | +==== |
| 141 | +If you remove the device information from the `spec.permittedHostDevices` stanza without also removing it from the `spec.mediatedDevicesConfiguration` stanza, you cannot create a new mediated device type on the same node. To properly remove mediated devices, remove the device information from both stanzas. |
| 142 | +==== |
| 143 | + |
| 144 | +Depending on the specific changes, these actions cause {VirtProductName} to reconfigure mediated devices or remove them from the cluster nodes. |
0 commit comments