You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/aks/gpu-cluster.md
+9-8Lines changed: 9 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,11 +25,12 @@ To view supported GPU-enabled VMs, see [GPU-optimized VM sizes in Azure][gpu-sku
25
25
* If you're using an Azure Linux GPU-enabled node pool, automatic security patches aren't applied, and the default behavior for the cluster is *Unmanaged*. For more information, see [auto-upgrade](./auto-upgrade-node-image.md).
26
26
*[NVadsA10](../virtual-machines/nva10v5-series.md) v5-series are *not* a recommended SKU for GPU VHD.
27
27
* AKS doesn't support Windows GPU-enabled node pools.
28
+
* Updating an existing node pool to add GPU isn't supported.
28
29
29
30
## Before you begin
30
31
31
32
* This article assumes you have an existing AKS cluster. If you don't have a cluster, create one using the [Azure CLI][aks-quickstart-cli], [Azure PowerShell][aks-quickstart-powershell], or the [Azure portal][aks-quickstart-portal].
32
-
* You need the Azure CLI version 1.0.0b2 or later installed and configured. Run `az --version` to find the version. If you need to install or upgrade, see [Install Azure CLI][install-azure-cli].
33
+
* You need the Azure CLI version 2.0.64 or later installed and configured. Run `az --version` to find the version. If you need to install or upgrade, see [Install Azure CLI][install-azure-cli].
33
34
34
35
## Get the credentials for your cluster
35
36
@@ -67,7 +68,7 @@ AKS has automatic GPU driver installation enabled by default. In some cases, suc
67
68
--cluster-name myAKSCluster \
68
69
--name gpunp \
69
70
--node-count 1 \
70
-
--skip-gpu-install\
71
+
--skip-gpu-driver-install\
71
72
--node-vm-size Standard_NC6s_v3 \
72
73
--node-taints sku=gpu:NoSchedule \
73
74
--enable-cluster-autoscaler \
@@ -113,7 +114,7 @@ To use the default OS SKU, you create the node pool without specifying an OS SKU
113
114
* `--max-count`: Configures the cluster autoscaler to maintain a maximum of three nodes in the node pool.
114
115
115
116
> [!NOTE]
116
-
> Taints and VM sizes can only be set for node pools during node pool creation, but you can update autoscaler settings at any time.
117
+
> Taints and VM sizes can only be set for node pools during node pool creation, but you can update autoscaler settings at any time.
117
118
118
119
##### [Azure Linux node pool](#tab/add-azure-linux-gpu-node-pool)
119
120
@@ -144,17 +145,17 @@ To use Azure Linux, you specify the OS SKU by setting `os-sku` to `AzureLinux` d
144
145
* `--max-count`: Configures the cluster autoscaler to maintain a maximum of three nodes in the node pool.
145
146
146
147
> [!NOTE]
147
-
> Taints and VM sizes can only be set for node pools during node pool creation, but you can update autoscaler settings at any time.
148
+
> Taints and VM sizes can only be set for node pools during node pool creation, but you can update autoscaler settings at any time. Certain SKUs, including A100 and H100 VM SKUs, aren't available for Azure Linux. For more information, see [GPU-optimized VM sizes in Azure][gpu-skus].
148
149
149
150
---
150
151
151
-
2. Create a namespace using the [`kubectl create namespace`][kubectl-create] command.
152
+
1. Create a namespace using the [`kubectl create namespace`][kubectl-create] command.
152
153
153
154
```bash
154
155
kubectl create namespace gpu-resources
155
156
```
156
157
157
-
3. Create a file named *nvidia-device-plugin-ds.yaml* and paste the following YAML manifest provided as part of the [NVIDIA device plugin for Kubernetes project][nvidia-github]:
158
+
2. Create a file named *nvidia-device-plugin-ds.yaml* and paste the following YAML manifest provided as part of the [NVIDIA device plugin for Kubernetes project][nvidia-github]:
158
159
159
160
```yaml
160
161
apiVersion: apps/v1
@@ -206,13 +207,13 @@ To use Azure Linux, you specify the OS SKU by setting `os-sku` to `AzureLinux` d
206
207
path: /var/lib/kubelet/device-plugins
207
208
```
208
209
209
-
4. Create the DaemonSet and confirm the NVIDIA device plugin is created successfully using the [`kubectl apply`][kubectl-apply] command.
210
+
3. Create the DaemonSet and confirm the NVIDIA device plugin is created successfully using the [`kubectl apply`][kubectl-apply] command.
210
211
211
212
```bash
212
213
kubectl apply -f nvidia-device-plugin-ds.yaml
213
214
```
214
215
215
-
5. Now that you successfully installed the NVIDIA device plugin, you can check that your [GPUs are schedulable](#confirm-that-gpus-are-schedulable) and [run a GPU workload](#run-a-gpu-enabled-workload).
216
+
4. Now that you successfully installed the NVIDIA device plugin, you can check that your [GPUs are schedulable](#confirm-that-gpus-are-schedulable) and [run a GPU workload](#run-a-gpu-enabled-workload).
0 commit comments