Skip to content

Commit c72343a

Browse files
Merge pull request #17717 from sethmanheim/hub-gpus
Hub: add GPU notes
2 parents 75e694a + a569623 commit c72343a

File tree

2 files changed

+33
-25
lines changed

2 files changed

+33
-25
lines changed

azure-stack/operator/manage-gpu-capacity.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to add GPUs to an existing Azure Stack Hub system.
44
author: sethmanheim
55
ms.author: sethm
66
ms.topic: how-to
7-
ms.date: 05/17/2021
7+
ms.date: 04/21/2025
88
ms.custom: template-how-to
99
---
1010

@@ -23,6 +23,8 @@ The following flow shows the general process to add memory to each scale unit no
2323

2424
:::image type="content" source="media/manage-gpu-capacity/add-memory-process.png" alt-text="Add GPU capacity flow":::
2525

26+
Each GPU VM can only use GPUs from a single node, and GPU VMs are not automatically load balanced. For example, you have 4 nodes and 2 GPUs on each node, and you create 4 VMs with 1 GPU for each VM. Each VM can exist on a different node. If that happens, any single node only has 1 available GPU left. From the portal, you can see that there are 4 GPUs available. However, if you try to create a VM with 2 GPUs, it fails with insufficient GPU capacity, because no single node has 2 GPUs available. The solution is to create the VMs with 2 GPUs first.
27+
2628
## Upgrade GPUs or add to an existing node
2729

2830
The following section provides a high-level overview of the process to add a GPU.
@@ -36,9 +38,9 @@ The following section provides a high-level overview of the process to add a GPU
3638

3739
## Change GPU partition size
3840

39-
Azure Stack Hub supports GPU partitioning for the AMD MI25. With GPU partitioning, you can increase the density of virtual machines using a virtual GPU instance. You can change the partition size to meet specific workload requirements. By default, Azure Stack Hub uses the largest partition size (1/8) to provide the highest possible density with a 2 GB frame buffer. This is useful for workloads that require accelerated graphics applications and virtual desktops.
41+
Azure Stack Hub supports GPU partitioning for the AMD MI25. With GPU partitioning, you can increase the density of virtual machines using a virtual GPU instance. You can change the partition size to meet specific workload requirements. By default, Azure Stack Hub uses the largest partition size (1/8) to provide the highest possible density with a 2 GB frame buffer. This partitioning is useful for workloads that require accelerated graphics applications and virtual desktops.
4042

41-
To change the partition size, do the following:
43+
To change the partition size, perform the following steps:
4244

4345
1. Deallocate all VMs that are currently using a GPU.
4446
1. Ensure that the [PowerShell Az module](powershell-install-az-module.md) for Azure Stack Hub is installed.
@@ -50,6 +52,7 @@ To change the partition size, do the following:
5052
```powershell
5153
Get-AzsScaleUnit # Returns a list of information about scale units in your stamp
5254
```
55+
5356
Update the following `$partitionSize` and `$scaleUnitName` variables using the "**name**" value returned in the previous step, then run the following to update the scale unit partition size:
5457

5558
```powershell
@@ -67,6 +70,9 @@ To change the partition size, do the following:
6770
| 2 | 1/2 of a physical GPU. |
6871
| 1 | Entire physical GPU. |
6972

73+
> [!NOTE]
74+
> Resizing GPU VMs is not supported.
75+
7076
## Next steps
7177

7278
- [Manage storage accounts in Azure Stack Hub](azure-stack-manage-storage-accounts.md).

azure-stack/user/gpu-vms-about.md

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ author: sethmanheim
55
ms.author: sethm
66
ms.service: azure-stack
77
ms.topic: reference
8-
ms.date: 10/24/2024
9-
ms.reviewer: unknown
8+
ms.date: 04/21/2025
9+
ms.reviewer: rtibi
1010
ms.lastreviewed: 4/28/2021
1111

1212
# Intent: As a a developer on Azure Stack Hub, I want to use a machine with a Graphics Processing Unit (GPU) in order to deliver an processing intensive visualization application.
@@ -40,9 +40,9 @@ NCv3-series VMs are powered by NVIDIA Tesla V100 GPUs. Customers can take advant
4040

4141
## NVv4
4242

43-
The NVv4-series virtual machines are powered by AMD Radeon Instinct MI25 GPUs. With the NVv4-series, Azure Stack Hub introduces virtual machines with partial GPUs. This size can be used for GPU accelerated graphics applications and virtual desktops. NVv4 virtual machines currently support only the Windows guest operating system.
43+
The NVv4-series virtual machines are powered by AMD Radeon Instinct MI25 GPUs. With the NVv4-series, Azure Stack Hub introduces virtual machines with partial GPUs. This size can be used for GPU accelerated graphics applications and virtual desktops. NVv4 virtual machines currently support only the Windows guest operating system.
4444

45-
| Size | vCPU | Memory: GiB | Temp storage (SSD) GiB | GPU | GPU memory: GiB | Max data disks | Max NICs |
45+
| Size | vCPU | Memory: GiB | Temp storage (SSD) GiB | GPU | GPU memory: GiB | Max data disks | Max NICs |
4646
| --- | --- | --- | --- | --- | --- | --- | --- |
4747
| Standard_NV4as_v4 |4 |14 |88 | 1/8 | 2 | 4 | 2 |
4848
| Standard_NV8as_v4 |8 |28 |176 | 1/4 | 4 | 8 | 4 |
@@ -80,7 +80,10 @@ The NC_A100 series VMs are powered by NVIDIA Ampere A100 GPUs, the successor of
8080
- Number of GPUs per server supported (1, 2, 3, 4). Preferred are: 1, 2, and 4.
8181
- All GPUs must be of the exact same SKU throughout the scale unit.
8282
- All GPU quantities per server must be the same throughout the scale unit.
83-
- GPU partition size (for AMD Mi25) needs to be the same throughout all GPU VMs on the scale unit.
83+
- GPU partition size (for AMD Mi25) needs to be the same for all GPU VMs on the scale unit.
84+
85+
> [!NOTE]
86+
> Resizing GPU VMs is not supported.
8487
8588
## Capacity planning
8689

@@ -94,14 +97,14 @@ Azure Stack Hub now supports adding GPUs to any existing system. To add a GPU, r
9497

9598
GPU VMs undergo downtime during operations such as patch and update (PnU) and hardware replacement (FRU) of Azure Stack Hub. The following table covers the state of the VM as observed during these activities and the manual action you can do to make these VMs available after the operation.
9699

97-
| Operation | PnU - Full Update, OEM update | FRU |
98-
| --- | --- | --- |
99-
| VM state | Unavailable during update. Can be made available with manual operation. VM is automatically online post update. | Unavailable during FRU. Can be made available with manual operation. VM needs to be brought back up after FRU|
100+
| Operation | PnU - Full Update, OEM update | FRU |
101+
| --- | --- | --- |
102+
| VM state | Unavailable during update. Can be made available with manual operation. VM is automatically online post update. | Unavailable during FRU. Can be made available with manual operation. VM needs to be brought back up after FRU|
100103
| Manual operation | If the VM needs to be made available during the update, if there are available GPU partitions, the VM can be restarted from the portal by clicking the **Restart** button. VM automatically comes back up post update. | VM is not available during FRU. If there are available GPUs, VM may be stop-deallocated and restarted during FRU. Post FRU completion, the VM must be `stop-deallocated` using the **Stop** button, then restarted using the **Start** button.|
101104

102105
## Guest driver installation
103106

104-
The following PowerShell cmdlets can be used for driver installation:
107+
You can use the [Set-AzVMExtension](/powershell/module/az.compute/set-azvmextension) PowerShell cmdlet for driver installation:
105108

106109
```powershell
107110
$VmName = <VM Name In Portal>
@@ -112,18 +115,18 @@ $driverPublisher = "Microsoft.HpcCompute"
112115
$driverType = <Specify Driver Type> #GPU Driver Types: "NvidiaGpuDriverWindows"; "NvidiaGpuDriverLinux"; "AmdGpuDriverWindows"
113116
$driverVersion = <Specify Driver Version> #Nvidia Driver Version:"1.3"; AMD Driver Version:"1.0"
114117
115-
Set-AzureRmVMExtension -Location $Location `
116-
-Publisher $driverPublisher `
117-
-ExtensionType $driverType `
118-
-TypeHandlerVersion $driverVersion `
119-
-VMName $VmName `
120-
-ResourceGroupName $ResourceGroupName `
121-
-Name $driverName `
122-
-Settings $Settings ` # If no settings are set, omit this parameter
123-
-Verbose
118+
Set-AzVMExtension -Location $Location `
119+
-Publisher $driverPublisher `
120+
-ExtensionType $driverType `
121+
-TypeHandlerVersion $driverVersion `
122+
-VMName $VmName `
123+
-ResourceGroupName $ResourceGroupName `
124+
-Name $driverName `
125+
-Settings $Settings ` # If no settings are set, omit this parameter
126+
-Verbose
124127
```
125128

126-
Depending on the OS, type and connectivity of your Azure Stack Hub GPU VM, you must replace these values with the settings below.
129+
Depending on the OS, type, and connectivity of your Azure Stack Hub GPU VM, you must replace these values with the following settings.
127130

128131
### AMD MI25
129132

@@ -157,7 +160,7 @@ NVIDIA drivers must be installed inside the virtual machine for CUDA or GRID wor
157160

158161
#### Use case: graphics/visualization GRID
159162

160-
This scenario requires the use of GRID drivers. GRID drivers can be downloaded through the NVIDIA Application Hub provided you have the required licenses. The GRID drivers also require a GRID license server with appropriate GRID licenses before using the GRID drivers on the VM.
163+
This scenario requires the use of GRID drivers. GRID drivers can be downloaded through the NVIDIA Application Hub provided you have the required licenses. The GRID drivers also require a GRID license server with appropriate GRID licenses before using the GRID drivers on the VM.
161164

162165
```powershell
163166
$Settings = @{
@@ -172,8 +175,7 @@ CUDA drivers don't need a license server and don't need modified settings.
172175

173176
### Use case: compute/CUDA - Disconnected
174177

175-
Links to NVIDIA CUDA drivers can be obtained using the link:
176-
https://raw.githubusercontent.com/Azure/azhpc-extensions/master/NvidiaGPU/resources.json
178+
You can get [links to NVIDIA CUDA drivers here](https://raw.githubusercontent.com/Azure/azhpc-extensions/master/NvidiaGPU/resources.json).
177179

178180
**Windows:**
179181

0 commit comments

Comments
 (0)