Skip to content

Commit bf654a7

Browse files
authored
Merge pull request #124951 from ehvs/hevs/gpu-sku
added steps for SKU
2 parents 7b518d1 + 678def2 commit bf654a7

File tree

1 file changed

+31
-1
lines changed

1 file changed

+31
-1
lines changed

articles/openshift/howto-gpu-workloads.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: johnmarc
66
ms.service: azure-redhat-openshift
77
keywords: aro, gpu, openshift, red hat
88
ms.topic: how-to
9-
ms.date: 12/15/2023
9+
ms.date: 11/29/2024
1010
ms.custom: template-how-to
1111
---
1212

@@ -188,6 +188,36 @@ ARO uses Kubernetes MachineSet to create machine sets. The procedure below expla
188188

189189
1. Verify the other data in the yaml file.
190190

191+
#### Ensure the correct SKU is set
192+
193+
Depending on the image used for the machine set, both values for `image.sku` and `image.version` must be set accordingly. This is to ensure if generation 1 or 2 virtual machine for Hyper-V will be used. See [here](/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v) for more information.
194+
195+
Example:
196+
197+
If using `Standard_NC4as_T4_v3`, both versions are supported. As mentioned in [Feature support](/azure/virtual-machines/sizes/gpu-accelerated/ncast4v3-series?tabs=sizebasic#feature-support). In this case, no changes are required.
198+
199+
If using `Standard_NC24ads_A100_v4`, only **Generation 2 VM** is [supported](/azure/virtual-machines/sizes/gpu-accelerated/nca100v4-series?tabs=sizebasic#feature-support).
200+
In this case, the `image.sku` value must follow the equivalent `v2` version of the image that corresponds to the cluster's original `image.sku`. For this example, the value will be `v410-v2`.
201+
202+
This can be found using the following command:
203+
204+
```bash
205+
az vm image list --architecture x64 -o table --all --offer aro4 --publisher azureopenshift
206+
```
207+
208+
```
209+
Filtered output:
210+
211+
SKU VERSION
212+
------- ---------------
213+
v410-v2 410.84.20220125
214+
aro_410 410.84.20220125
215+
```
216+
217+
If the cluster was created with the base SKU image `aro_410`, and the same value is kept in the machine set, it will fail with the following error:
218+
```
219+
failure sending request for machine myworkernode: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_NC24ads_A100_v4' cannot boot Hypervisor Generation '1'.
220+
```
191221
#### Create GPU machine set
192222
193223
Use the following steps to create the new GPU machine. It may take 10-15 minutes to provision a new GPU machine. If this step fails, sign in to [Azure portal](https://portal.azure.com) and ensure there are no availability issues. To do so, go to **Virtual Machines** and search for the worker name you created previously to see the status of VMs.

0 commit comments

Comments
 (0)