Skip to content

Commit 893a3cf

Browse files
author
Jill Grant
authored
Merge pull request #234138 from divargas-msft/patch-7
[Doc-a-thon] Updating configure.md
2 parents 7127bdd + bc295fc commit 893a3cf

File tree

1 file changed

+50
-25
lines changed

1 file changed

+50
-25
lines changed

articles/virtual-machines/configure.md

Lines changed: 50 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn about configuring and optimizing the InfiniBand enabled H-ser
44
ms.service: virtual-machines
55
ms.subservice: hpc
66
ms.topic: article
7-
ms.date: 03/28/2023
7+
ms.date: 04/11/2023
88
ms.reviewer: cynthn, mattmcinnes
99
ms.author: mamccrea
1010
author: mamccrea
@@ -17,14 +17,17 @@ author: mamccrea
1717
This article shares some guidance on configuring and optimizing the InfiniBand-enabled [HB-series](sizes-hpc.md) and [N-series](sizes-gpu.md) VMs for HPC.
1818

1919
## VM images
20+
2021
On InfiniBand (IB) enabled VMs, the appropriate drivers are required to enable RDMA.
22+
2123
- The [CentOS-HPC VM images](#centos-hpc-vm-images) in the Marketplace come preconfigured with the appropriate IB drivers.
22-
- The CentOS-HPC version 7.9 VM image additionally comes preconfigured with the NVIDIA GPU drivers.
24+
- The CentOS-HPC version 7.9 VM image additionally comes preconfigured with the NVIDIA GPU drivers.
2325
- The [Ubuntu-HPC VM images](#ubuntu-hpc-vm-images) in the Marketplace come preconfigured with the appropriate IB drivers and GPU drivers.
2426

2527
These VM images are based on the base CentOS and Ubuntu marketplace VM images. Scripts used in the creation of these VM images from their base CentOS Marketplace image are on the [azhpc-images repo](https://github.com/Azure/azhpc-images/tree/master/centos).
2628

2729
On GPU enabled [N-series](sizes-gpu.md) VMs, the appropriate GPU drivers are additionally required. This can be available by the following methods:
30+
2831
- Use the [Ubuntu-HPC VM images](#ubuntu-hpc-vm-images) and [CentOS-HPC VM image](#centos-hpc-vm-images) version 7.9 that come preconfigured with the NVIDIA GPU drivers and GPU compute software stack (CUDA, NCCL).
2932
- Add the GPU drivers through the [VM extensions](./extensions/hpccompute-gpu-linux.md).
3033
- Install the GPU drivers [manually](./linux/n-series-driver-setup.md).
@@ -36,66 +39,84 @@ It's also recommended to create [custom VM images](./linux/tutorial-custom-image
3639
### VM sizes supported by the HPC VM images
3740

3841
#### InfiniBand OFED support
42+
3943
The latest Azure HPC marketplace images come with Mellanox OFED 5.1 and above, which do not support ConnectX3-Pro InfiniBand cards. ConnectX-3 Pro InfiniBand cards require MOFED 4.9 LTS version. These VM images only support ConnextX-5 and newer InfiniBand cards. The following VM size support matrix for the InfiniBand OFED in these HPC VM images:
44+
4045
- [HB-series](sizes-hpc.md): HB, HC, HBv2, HBv3, HBv4
4146
- [N-series](sizes-gpu.md): NDv2, NDv4
4247

4348
#### GPU driver support
49+
4450
Currently only the [Ubuntu-HPC VM images](#ubuntu-hpc-vm-images) and [CentOS-HPC VM images](#centos-hpc-vm-images) version 7.9 come preconfigured with the NVIDIA GPU drivers and GPU compute software stack (CUDA, NCCL).
4551

4652
The VM size support matrix for the GPU drivers in supported HPC VM images is as follows:
53+
4754
- [N-series](sizes-gpu.md): NDv2, NDv4 VM sizes are supported with the NVIDIA GPU drivers and GPU compute software stack (CUDA, NCCL).
4855
- The other 'NC' and 'ND' VM sizes in the [N-series](sizes-gpu.md) are supported with the NVIDIA GPU drivers.
4956

5057
All of the VM sizes in the N-series support [Gen 2 VMs](generation-2.md), though some older ones also support Gen 1 VMs. Gen 2 support is also indicated with a "01" at the end of the image URN or version.
5158

52-
### CentOS-HPC VM images
59+
### SR-IOV enabled VMs
60+
61+
#### CentOS-HPC VM images
5362

54-
#### SR-IOV enabled VMs
5563
For SR-IOV enabled [RDMA capable VMs](sizes-hpc.md#rdma-capable-instances), [Ubuntu-HPC VM images](#ubuntu-hpc-vm-images) and CentOS-HPC VM images version 7.6 and later are suitable. These VM images come preconfigured with the Mellanox OFED drivers for RDMA and commonly used MPI libraries and scientific computing packages. Refer to the [VM size support matrix](#vm-sizes-supported-by-the-hpc-vm-images).
64+
5665
- The available or latest versions of the VM images can be listed with the following information using [CLI](/cli/azure/vm/image#az-vm-image-list) or [Marketplace](https://azuremarketplace.microsoft.com/marketplace/apps/openlogic.centos-hpc?tab=Overview).
57-
```bash
66+
67+
```output
5868
"publisher": "OpenLogic",
5969
"offer": "CentOS-HPC",
6070
```
71+
6172
- Scripts used in the creation of the [Ubuntu-HPC VM images](#ubuntu-hpc-vm-images) and CentOS-HPC version 7.6 and later VM images from a base CentOS Marketplace image are on the [azhpc-images repo](https://github.com/Azure/azhpc-images/tree/master/centos).
6273
- Additionally, details on what's included in the [Ubuntu-HPC VM images](#ubuntu-hpc-vm-images) and CentOS-HPC version 7.6 and later VM images, and how to deploy them are in a [TechCommunity article](https://techcommunity.microsoft.com/t5/azure-compute/azure-hpc-vm-images/ba-p/977094).
6374

64-
> [!NOTE]
75+
> [!NOTE]
6576
> Among the CentOS-HPC VM images, currently only the version 7.9 VM image additionally comes preconfigured with the NVIDIA GPU drivers and GPU compute software stack (CUDA, NCCL).
6677
67-
> [!NOTE]
78+
> [!NOTE]
6879
> SR-IOV enabled N-series VM sizes with FDR InfiniBand (e.g. NCv3 and older) will be able to use the following CentOS-HPC VM image or older versions from the Marketplace:
80+
6981
>- OpenLogic:CentOS-HPC:7.6:7.6.2020062900
7082
>- OpenLogic:CentOS-HPC:7_6gen2:7.6.2020062901
7183
>- OpenLogic:CentOS-HPC:7.7:7.7.2020062600
7284
>- OpenLogic:CentOS-HPC:7_7-gen2:7.7.2020062601
7385
>- OpenLogic:CentOS-HPC:8_1:8.1.2020062400
7486
>- OpenLogic:CentOS-HPC:8_1-gen2:8.1.2020062401
7587
76-
### Ubuntu-HPC VM images
88+
#### Ubuntu-HPC VM images
89+
7790
For SR-IOV enabled [RDMA capable VMs](sizes-hpc.md#rdma-capable-instances), Ubuntu-HPC VM images versions 18.04 and 20.04 are suitable. These VM images come preconfigured with the Mellanox OFED drivers for RDMA, NVIDIA GPU drivers, GPU compute software stack (CUDA, NCCL), and commonly used MPI libraries and scientific computing packages. Refer to the [VM size support matrix](#vm-sizes-supported-by-the-hpc-vm-images).
91+
7892
- The available or latest versions of the VM images can be listed with the following information using [CLI](/cli/azure/vm/image#az-vm-image-list) or [Marketplace](https://azuremarketplace.microsoft.com/marketplace/apps/microsoft-dsvm.ubuntu-hpc?tab=overview).
79-
```bash
93+
94+
```output
8095
"publisher": "Microsoft-DSVM",
8196
"offer": "Ubuntu-HPC",
8297
```
98+
8399
- Scripts used in the creation of the Ubuntu-HPC VM images from a base Ubuntu Marketplace image are on the [azhpc-images repo](https://github.com/Azure/azhpc-images/tree/master/ubuntu).
84100
- Additionally, details on what's included in the Ubuntu-HPC VM images, and how to deploy them are in a [TechCommunity article](https://techcommunity.microsoft.com/t5/azure-compute/azure-hpc-vm-images/ba-p/977094).
85101

86102
### RHEL/CentOS VM images
103+
87104
The base RHEL or CentOS-based non-HPC VM images on the Marketplace can be configured for use on the SR-IOV enabled [RDMA capable VMs](sizes-hpc.md#rdma-capable-instances). Learn more about [enabling InfiniBand](./extensions/enable-infiniband.md) and [setting up MPI](setup-mpi.md) on the VMs.
105+
88106
- Scripts used in the creation of the CentOS-HPC version 7.6 and later VM images from a base CentOS Marketplace image from the [azhpc-images repo](https://github.com/Azure/azhpc-images/tree/master/centos) can also be used.
89-
107+
90108
### Ubuntu VM images
109+
91110
The base Ubuntu Server 16.04 LTS, 18.04 LTS, and 20.04 LTS VM images in the Marketplace are supported for both SR-IOV and non-SR-IOV [RDMA capable VMs](sizes-hpc.md#rdma-capable-instances). Learn more about [enabling InfiniBand](./extensions/enable-infiniband.md) and [setting up MPI](setup-mpi.md) on the VMs.
111+
92112
- Instructions for enabling InfiniBand on the Ubuntu VM images are in a [TechCommunity article](https://techcommunity.microsoft.com/t5/azure-compute/configuring-infiniband-for-ubuntu-hpc-and-gpu-vms/ba-p/1221351).
93113
- Scripts used in the creation of the Ubuntu 18.04 and 20.04 LTS based HPC VM images from a base Ubuntu Marketplace image are on the [azhpc-images repo](https://github.com/Azure/azhpc-images/tree/master/ubuntu).
94114

95115
> [!NOTE]
96116
> Mellanox OFED 5.1 and above don't support ConnectX3-Pro InfiniBand cards on SR-IOV enabled N-series VM sizes with FDR InfiniBand (e.g. NCv3). Please use LTS Mellanox OFED version 4.9-0.1.7.0 or older on the N-series VM's with ConnectX3-Pro cards. For more information, see [Linux InfiniBand Drivers](https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed).
97117
98118
### SUSE Linux Enterprise Server VM images
119+
99120
SLES 12 SP3 for HPC, SLES 12 SP3 for HPC (Premium), SLES 12 SP1 for HPC, SLES 12 SP1 for HPC (Premium), SLES 12 SP4 and SLES 15 VM images in the Marketplace are supported. These VM images come preloaded with the Network Direct drivers for RDMA (on the non-SR-IOV VM sizes) and Intel MPI version 5.1. Learn more about [setting up MPI](setup-mpi.md) on the VMs.
100121

101122
## Optimize VMs
@@ -110,51 +131,55 @@ If necessary for functionality or performance, [Linux Integration Services (LIS)
110131
wget https://aka.ms/lis
111132
tar xzf lis
112133
pushd LISISO
113-
./upgrade.sh
134+
sudo ./upgrade.sh
114135
```
115136

116137
### Reclaim memory
117138

118139
Improve performance by automatically reclaiming memory to avoid remote memory access.
119140

120141
```bash
121-
echo 1 >/proc/sys/vm/zone_reclaim_mode
142+
sudo echo 1 >/proc/sys/vm/zone_reclaim_mode
122143
```
123144

124145
Keep reclaim memory mode persistent after VM reboots:
125146

126147
```bash
127-
echo "vm.zone_reclaim_mode = 1" >> /etc/sysctl.conf sysctl -p
148+
sudo echo "vm.zone_reclaim_mode = 1" >> /etc/sysctl.conf sysctl -p
128149
```
129150

130151
### Disable firewall and SELinux
131152

132153
```bash
133-
systemctl stop iptables.service
134-
systemctl disable iptables.service
135-
systemctl mask firewalld
136-
systemctl stop firewalld.service
137-
systemctl disable firewalld.service
138-
iptables -nL
139-
sed -i -e's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
154+
sudo systemctl stop iptables.service
155+
sudo systemctl disable iptables.service
156+
sudo systemctl mask firewalld
157+
sudo systemctl stop firewalld.service
158+
sudo systemctl disable firewalld.service
159+
sudo iptables -nL
160+
sudo sed -i -e's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
140161
```
141162

142163
### Disable cpupower
143164

144165
```bash
145-
service cpupower status
146-
if enabled, disable it:
147-
service cpupower stop
166+
sudo service cpupower status
167+
```
168+
169+
If enabled, disable it:
170+
171+
```bash
172+
sudo service cpupower stop
148173
sudo systemctl disable cpupower
149174
```
150175

151176
### Configure WALinuxAgent
152177

153178
```bash
154-
sed -i -e 's/# OS.EnableRDMA=y/OS.EnableRDMA=y/g' /etc/waagent.conf
179+
sudo sed -i -e 's/# OS.EnableRDMA=y/OS.EnableRDMA=y/g' /etc/waagent.conf
155180
```
156-
Optionally, the WALinuxAgent may be disabled before running a job then enabled post-job for maximum VM resource availability to the HPC workload.
157181

182+
Optionally, the WALinuxAgent may be disabled before running a job then enabled post-job for maximum VM resource availability to the HPC workload.
158183

159184
## Next steps
160185

0 commit comments

Comments
 (0)