|
| 1 | +--- |
| 2 | +title: GPU acceleration for AKS Edge Essentials |
| 3 | +description: Learn how to expose a GPU to the virtual machine's Linux module. |
| 4 | +author: sethmanheim |
| 5 | +ms.topic: how-to |
| 6 | +ms.date: 07/09/2024 |
| 7 | +ms.author: sethm |
| 8 | +ms.lastreviewed: 06/05/2024 |
| 9 | +ms.reviewer: guanghu |
| 10 | + |
| 11 | +# Intent: As an IT Pro, I want to learn how to expose a GPU to Linux |
| 12 | +# Keyword: GPU AKS Edge Essentials |
| 13 | +--- |
| 14 | + |
| 15 | +# Use GPU acceleration for AKS Edge Essentials |
| 16 | + |
| 17 | +GPUs are a popular choice for artificial intelligence computations, because they offer parallel processing capabilities and can often execute vision-based inferencing faster than CPUs. To better support artificial intelligence and machine learning applications, AKS Edge Essentials can expose a GPU to the virtual machine's Linux module. |
| 18 | + |
| 19 | +AKS Edge Essentials supports *GPU-Paravirtualization* (GPU-PV) as the GPU passthrough technology. In other words, the GPU is shared between the Linux virtual machine and the host. |
| 20 | + |
| 21 | +> [!IMPORTANT] |
| 22 | +> These features include components developed and owned by NVIDIA Corporation or its licensors. By using GPU acceleration features, you are accepting and agreeing to the terms of the [NVIDIA End-User License Agreement](https://www.nvidia.com/content/DriverDownload-March2009/licence.php?lang=us). |
| 23 | +
|
| 24 | +## Prerequisites |
| 25 | + |
| 26 | +The GPU acceleration of AKS Edge Essentials currently supports a specific set of GPU hardware. Additionally, use of this feature requires specific versions of Windows. |
| 27 | + |
| 28 | +The supported GPUs and required Windows versions are as follows: |
| 29 | + |
| 30 | +| Supported GPUs | GPU passthrough type | Supported Windows versions | |
| 31 | +|-----------------------------|--------------------------|-------------------------------------------------| |
| 32 | +| NVIDIA GeForce, Quadro, RTX | GPU-PV | Windows 10/11 (Pro, Enterprise, IoT Enterprise) | |
| 33 | + |
| 34 | +> [!IMPORTANT] |
| 35 | +> GPU-PV support might be limited to certain generations of processors or GPU architectures, as determined by the GPU vendor. For more information, see the [NVIDIA CUDA for WSL documentation](https://developer.nvidia.com/cuda/wsl). |
| 36 | +> |
| 37 | +> Windows 10 users must use the November 2021 update build 19044.1620, or later. After installation, you can verify your build version by running `winver` at the command prompt. |
| 38 | +> |
| 39 | +> GPU passthrough is not supported with nested virtualization, such as when you run AKS Edge Essentials in a Windows virtual machine. |
| 40 | +
|
| 41 | +## System setup and installation |
| 42 | + |
| 43 | +The following sections contain setup and installation information. |
| 44 | + |
| 45 | +- For **NVIDIA GeForce/Quadro/RTX GPUs**, download and install the [NVIDIA CUDA-enabled driver for Windows Subsystem for Linux (WSL)](https://developer.nvidia.com/cuda/wsl) to use with your existing CUDA ML workflows. Originally developed for WSL, the CUDA for WSL drivers is also used for AKS Edge Essentials. |
| 46 | +- Windows 10 users must also [install WSL](/windows/wsl/install) because some of the libraries are shared between WSL and AKS Edge Essentials. |
| 47 | +- Install or upgrade AKS Edge Essentials to the May 2024 release, or later. For more information, see [Update your AKS Edge Essentials clusters](aks-edge-howto-update.md). The GPU-PV is supported on both k8s and k3s Kubernetes distributions. |
| 48 | + |
| 49 | +## Enable GPU-PV in your AKS Edge Essentials deployment |
| 50 | + |
| 51 | +### Step 1: single machine configuration parameters |
| 52 | + |
| 53 | +You can generate the parameters you need to create a single machine cluster and add the necessary GPU-PV configuration parameters using the following commands. |
| 54 | + |
| 55 | +This script only focuses on the GPU-PV configuration; you should also make other necessary general updates according to your own AKS Edge Essentials deployment: |
| 56 | + |
| 57 | +```powershell |
| 58 | +$jsonObj = New-AksEdgeConfig -DeploymentType SingleMachineCluster |
| 59 | +$jsonObj.User.AcceptGpuWarning = $true |
| 60 | +$machine = $jsonObj.Machines[0] |
| 61 | +$machine.LinuxNode.GpuPassthrough.Name = "NVIDIA GeForce GTX 1070" |
| 62 | +$machine.LinuxNode.GpuPassthrough.Type = "ParaVirtualization" |
| 63 | +$machine.LinuxNode.GpuPassthrough.Count = 1 |
| 64 | +``` |
| 65 | + |
| 66 | +The key parameters to enable GPU-PV are: |
| 67 | + |
| 68 | +- `User.AcceptGpuWarning`: Set this parameter to `true` to automatically accept the confirmation message when you enable GPU-PV on AKS Edge Essentials. |
| 69 | +- `LinuxNode.GpuPassthrough.Name`: Describes the GPU model that's deployed in this machine, with proper drivers installed. |
| 70 | +- `LinuxNode.GpuPassthrough.Type`: Describes the GPU passthrough type. Only `ParaVirtualization` is currently supported. |
| 71 | +- `LinuxNode.GpuPassthrough.Count`: Describes how many GPUs are installed on this machine. |
| 72 | + |
| 73 | +### Step 2: create a single machine cluster |
| 74 | + |
| 75 | +1. You can now run the `New-AksEdgeDeployment` cmdlet to deploy a single-machine AKS Edge cluster with a single Linux control plane node. You can use the JSON object generated in [step 1](#step-1-single-machine-configuration-parameters) and pass it as a string: |
| 76 | + |
| 77 | + ```powershell |
| 78 | + New-AksEdgeDeployment -JsonConfigString (New-AksEdgeConfig | ConvertTo-Json -Depth 4) |
| 79 | + ``` |
| 80 | + |
| 81 | +1. After successful deployment, verify GPU-PV is enabled by running `nvidia-smi`: |
| 82 | + |
| 83 | + :::image type="content" source="media/aks-edge-gpu/gpu-pv-enabled.png" alt-text="Screenshot showing NVIDIA smi output." lightbox="media/aks-edge-gpu/gpu-pv-enabled.png"::: |
| 84 | + |
| 85 | +### Step 3: Deploy Nvidia runtimeclass |
| 86 | + |
| 87 | +1. Create a YAML file named **nvidia-runtimeclass.yaml** with the following content: |
| 88 | + |
| 89 | + ```yaml |
| 90 | + apiVersion: node.k8s.io/v1 |
| 91 | + kind: RuntimeClass |
| 92 | + metadata: |
| 93 | + name: nvidia |
| 94 | + handler: nvidia |
| 95 | + ``` |
| 96 | +
|
| 97 | +1. Deploy the `runtimeclass`: |
| 98 | + |
| 99 | + ```powershell |
| 100 | + kubectl apply –f nvidia-runtimeclass.yaml |
| 101 | + ``` |
| 102 | + |
| 103 | +### Step 4: Install Nvidia GPU device plugin |
| 104 | + |
| 105 | +1. Download **nvidia-deviceplugin.yaml** from [this GitHub location](https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.3/nvidia-device-plugin.yml). |
| 106 | +1. Update the container images location in the **nvidia-deviceplugin.yaml** file to the new value, as follows: |
| 107 | + |
| 108 | + ```yaml |
| 109 | + containers: |
| 110 | + - image: registry.gitlab.com/nvidia/kubernetes/device-plugin/staging/k8s-device-plugin:6a31a868 |
| 111 | + ``` |
| 112 | + |
| 113 | +1. Install the Nvidia GPU DevicePlugin: |
| 114 | + |
| 115 | + ```powershell |
| 116 | + kubectl apply –f nvidia-deviceplugin.yaml |
| 117 | + ``` |
| 118 | + |
| 119 | +1. Verify that the plugin is running and the NVIDIA GPU is detected by checking the logs of the **deviceplugin** pod using the `kubectl get pods -A` and `kubectl logs $podname -n kube-system` commands: |
| 120 | + |
| 121 | + :::image type="content" source="media/aks-edge-gpu/kubectl-logs.png" alt-text="Screenshot showing kubectl logs command output." lightbox="media/aks-edge-gpu/kubectl-logs.png"::: |
| 122 | + |
| 123 | +## Get started with a sample workload |
| 124 | + |
| 125 | +1. Prepare a workload YAML file named **gpu-workload.yaml**: |
| 126 | + |
| 127 | + ```yaml |
| 128 | + apiVersion: v1 |
| 129 | + kind: Pod |
| 130 | + metadata: |
| 131 | + name: gpu-pod |
| 132 | + spec: |
| 133 | + restartPolicy: Never |
| 134 | + containers: |
| 135 | + - name: cuda-container |
| 136 | + image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 |
| 137 | + resources: |
| 138 | + limits: |
| 139 | + nvidia.com/gpu: 1 # requesting 1 GPU |
| 140 | + tolerations: |
| 141 | + - key: nvidia.com/gpu |
| 142 | + operator: Exists |
| 143 | + effect: NoSchedule |
| 144 | + ``` |
| 145 | + |
| 146 | +1. Run the sample workload: |
| 147 | + |
| 148 | + ```powershell |
| 149 | + kubectl apply -f .\gpu-workload.yaml |
| 150 | + ``` |
| 151 | + |
| 152 | +1. Verify that the workload ran successfully: |
| 153 | + |
| 154 | + :::image type="content" source="media/aks-edge-gpu/workload.png" alt-text="Screenshot showing that the workload ran successfully." lightbox="media/aks-edge-gpu/workload.png"::: |
| 155 | + |
| 156 | +## Next steps |
| 157 | + |
| 158 | +[AKS Edge Essentials overview](aks-edge-overview.md) |
0 commit comments