Merge pull request #67322 from StephenJamesSmith/TELCODOCS-1092

bscott-rh · web-flow · commit 3cef33c1a250 · 2023-12-04T10:56:57.000-05:00
TELCODOCS-1092: GPU sharing methods
diff --git a/architecture/nvidia-gpu-architecture-overview.adoc b/architecture/nvidia-gpu-architecture-overview.adoc
@@ -51,10 +51,33 @@ include::modules/nvidia-gpu-red-hat-device-edge.adoc[leveloffset=+2]
 .Additional resources
 * link:https://cloud.redhat.com/blog/how-to-accelerate-workloads-with-nvidia-gpus-on-red-hat-device-edge[How to accelerate workloads with NVIDIA GPUs on Red Hat Device Edge]
 
+// TELCODOCS-1092 GPU sharing methods
+include::modules/nvidia-gpu-sharing-methods.adoc[leveloffset=+1]
+.Additional resources
+* link:https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/[Improving GPU Utilization]
+
+include::modules/nvidia-gpu-cuda-streams.adoc[leveloffset=+2]
+.Additional resources
+* link:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#asynchronous-concurrent-execution[Asynchronous Concurrent Execution]
+
+include::modules/nvidia-gpu-time-slicing.adoc[leveloffset=+2]
+
+include::modules/nvidia-gpu-cuda-mps.adoc[leveloffset=+2]
+.Additional resources
+* link:https://docs.nvidia.com/deploy/mps/index.html[CUDA MPS]
+
+include::modules/nvidia-gpu-mig-gpu.adoc[leveloffset=+2]
+.Additional resources
+* link:https://docs.nvidia.com/datacenter/tesla/mig-user-guide/[NVIDIA Multi-Instance GPU User Guide]
+
+include::modules/nvidia-gpu-virtualization-with-gpu.adoc[leveloffset=+2]
+.Additional resources
+* link:https://www.nvidia.com/en-us/data-center/virtual-solutions/[Virtual GPUs]
 
 include::modules/nvidia-gpu-features.adoc[leveloffset=+1]
 [role="_additional-resources"]
 .Additional resources
+
 * link:https://docs.nvidia.com/ngc/ngc-deploy-on-premises/nvidia-certified-systems/index.html[NVIDIA-Certified Systems]
 * link:https://docs.nvidia.com/ai-enterprise/index.html#deployment-guides[NVIDIA AI Enterprise]
 * link:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/overview.html#[NVIDIA Container Toolkit]
diff --git a/modules/nvidia-gpu-cuda-mps.adoc b/modules/nvidia-gpu-cuda-mps.adoc
@@ -0,0 +1,10 @@
+// Module included in the following assemblies:
+//
+// * architecture/nvidia-gpu-architecture-overview.adoc
+
+:_content-type: CONCEPT
+[id="nvidia-gpu-cuda-mps_{context}"]
+= CUDA Multi-Process Service
+
+CUDA Multi-Process Service (MPS) allows a single GPU to use multiple CUDA processes. The processes run in parallel on the GPU, eliminating saturation of the GPU compute resources. MPS also enables concurrent execution, or overlapping, of kernel operations and memory copying from different processes to
+enhance utilization.
diff --git a/modules/nvidia-gpu-cuda-streams.adoc b/modules/nvidia-gpu-cuda-streams.adoc
@@ -0,0 +1,13 @@
+// Module included in the following assemblies:
+//
+// * architecture/nvidia-gpu-architecture-overview.adoc
+
+:_content-type: CONCEPT
+[id="nvidia-gpu-cuda-streams_{context}"]
+= CUDA streams
+
+Compute Unified Device Architecture (CUDA) is a parallel computing platform and programming model developed by NVIDIA for general computing on GPUs.
+
+A stream is a sequence of operations that executes in issue-order on the GPU. CUDA commands are typically executed sequentially in a default stream and a task does not start until a preceding task has completed.
+
+Asynchronous processing of operations across different streams allows for parallel execution of tasks. A task issued in one stream runs before, during, or after another task is issued into another stream. This allows the GPU to run multiple tasks simultaneously in no prescribed order, leading to improved performance.
diff --git a/modules/nvidia-gpu-features.adoc b/modules/nvidia-gpu-features.adoc
@@ -24,23 +24,6 @@ NVIDIA AI Enterprise includes support for Red Hat {product-title}. The following
 
 * {product-title} on VMware vSphere with NVIDIA vGPU.
 
-
-Multi-Instance GPU (MIG) Support in {product-title}::
-MIG is useful whenever you have an application that does not require the full power of an entire GPU. The MIG feature of the new NVIDIA Ampere architecture enables you to split your hardware resources into multiple GPU instances, each of which is available to the operating system as an independent CUDA-enabled GPU. The NVIDIA GPU Operator version 1.7.0 and higher provides MIG support for the A100 and A30 Ampere cards. These GPU instances are designed to support multiple independent CUDA applications (up to 7) so that they operate completely isolated from each other with dedicated hardware resources.
-+
-The GPU's compute units, in addition to their memory, can be split into multiple MIG instances. Each of these instances represents a standalone GPU device from a system perspective and can be connected to any application, container or virtual machine running on the node.
-+
-From the perspective of the software that uses the GPU, each of these MIG instances looks like its own individual GPU.
-
-Time-slicing NVIDIA GPUs in OpenShift::
-GPU time-slicing enables workloads scheduled on overloaded GPUs to be interleaved.
-+
-This mechanism for enabling time-slicing of GPUs in Kubernetes enables a system administrator to define a set of replicas for a GPU, each of which can be independently distributed to a pod to run workloads on. Unlike multi-instance GPU (MIG), there is no memory or fault isolation between replicas, but for some workloads this is better than not sharing at all. Internally, GPU time-slicing is used to multiplex workloads from replicas of the same underlying GPU.
-+
-You can apply a cluster-wide default configuration for time slicing. You can also apply node-specific configurations. For example, you can apply a time-slicing configuration only to nodes with Tesla T4 GPUs and not modify nodes with other GPU models.
-+
-You can combine these two approaches by applying a cluster-wide default configuration and then label nodes to give those nodes receive a node-specific configuration.
-
 GPU Feature Discovery::
 NVIDIA GPU Feature Discovery for Kubernetes is a software component that enables you to automatically generate labels for the GPUs available on a node. GPU Feature Discovery uses node feature discovery (NFD) to perform this labeling.
 +
diff --git a/modules/nvidia-gpu-mig-gpu.adoc b/modules/nvidia-gpu-mig-gpu.adoc
@@ -0,0 +1,13 @@
+// Module included in the following assemblies:
+//
+// * architecture/nvidia-gpu-architecture-overview.adoc
+
+:_content-type: CONCEPT
+[id="nvidia-gpu-mig-gpu_{context}"]
+= Multi-instance GPU
+
+Using Multi-instance GPU (MIG), you can split GPU compute units and memory into multiple MIG instances. Each of these instances represents a standalone GPU device from a system perspective and can be connected to any application, container, or virtual machine running on the node. The software that uses the GPU treats each of these MIG instances as an individual GPU.
+
+MIG is useful when you have an application that does not require the full power of an entire GPU. The MIG feature of the new NVIDIA Ampere architecture enables you to split your hardware resources into multiple GPU instances, each of which is available to the operating system as an independent CUDA-enabled GPU.
+
+NVIDIA GPU Operator version 1.7.0 and higher provides MIG support for the A100 and A30 Ampere cards. These GPU instances are designed to support up to seven multiple independent CUDA applications so that they operate completely isolated with dedicated hardware resources.
diff --git a/modules/nvidia-gpu-sharing-methods.adoc b/modules/nvidia-gpu-sharing-methods.adoc
@@ -0,0 +1,27 @@
+// Module included in the following assemblies:
+//
+// * architecture/nvidia-gpu-architecture-overview.adoc
+
+:_content-type: CONCEPT
+[id="nvidia-gpu-sharing-methods_{context}"]
+= GPU sharing methods
+
+Red{nbsp}Hat and NVIDIA have developed GPU concurrency and sharing mechanisms to simplify GPU-accelerated computing on an enterprise-level {product-title} cluster.
+
+Applications typically have different compute requirements that can leave GPUs underutilized. Providing the right amount of compute resources for each workload is critical to reduce deployment cost and maximize GPU utilization.
+
+Concurrency mechanisms for improving GPU utilization exist that range from programming model APIs to system software and hardware partitioning, including virtualization. The following list shows the GPU concurrency mechanisms:
+
+* Compute Unified Device Architecture (CUDA) streams
+* Time-slicing
+* CUDA Multi-Process Service (MPS)
+* Multi-instance GPU (MIG)
+* Virtualization with vGPU
+
+Consider the following GPU sharing suggestions when using the GPU concurrency mechanisms for different {product-title} scenarios:
+
+Bare metal:: vGPU is not available. Consider using MIG-enabled cards.
+VMs:: vGPU is the best choice.
+Older NVIDIA cards with no MIG on bare metal:: Consider using time-slicing.
+VMs with multiple GPUs and you want passthrough and vGPU:: Consider using separate VMs.
+Bare metal with {VirtProductName} and multiple GPUs:: Consider using pass-through for hosted VMs and time-slicing for containers.
diff --git a/modules/nvidia-gpu-time-slicing.adoc b/modules/nvidia-gpu-time-slicing.adoc
@@ -0,0 +1,15 @@
+// Module included in the following assemblies:
+//
+// * architecture/nvidia-gpu-architecture-overview.adoc
+
+:_content-type: CONCEPT
+[id="nvidia-gpu-time-slicing_{context}"]
+= Time-slicing
+
+GPU time-slicing interleaves workloads scheduled on overloaded GPUs when you are running multiple CUDA applications.
+
+You can enable time-slicing of GPUs on Kubernetes by defining a set of replicas for a GPU, each of which can be independently distributed to a pod to run workloads on. Unlike multi-instance GPU (MIG), there is no memory or fault isolation between replicas, but for some workloads this is better than not sharing at all. Internally, GPU time-slicing is used to multiplex workloads from replicas of the same underlying GPU.
+
+You can apply a cluster-wide default configuration for time-slicing. You can also apply node-specific configurations. For example, you can apply a time-slicing configuration only to nodes with Tesla T4 GPUs and not modify nodes with other GPU models.
+
+You can combine these two approaches by applying a cluster-wide default configuration and then labeling nodes to give those nodes a node-specific configuration.
diff --git a/modules/nvidia-gpu-virtualization-with-gpu.adoc b/modules/nvidia-gpu-virtualization-with-gpu.adoc
@@ -0,0 +1,11 @@
+// Module included in the following assemblies:
+//
+// * architecture/nvidia-gpu-architecture-overview.adoc
+
+:_content-type: CONCEPT
+[id="nvidia-gpu-virtualization-with-gpu_{context}"]
+= Virtualization with vGPU
+
+Virtual machines (VMs) can directly access a single physical GPU using NVIDIA vGPU. You can create virtual GPUs that can be shared by VMs across the enterprise and accessed by other devices.
+
+This capability combines the power of GPU performance with the management and security benefits provided by vGPU. Additional benefits provided by vGPU includes proactive management and monitoring for your VM environment, workload balancing for mixed VDI and compute workloads, and resource sharing across multiple VMs.