Skip to content

Commit 0f6fe25

Browse files
authored
Merge pull request #92238 from laurenhughes/patch-14
Updates for GPU VMs
2 parents 98af94d + 3eaf490 commit 0f6fe25

File tree

1 file changed

+29
-47
lines changed

1 file changed

+29
-47
lines changed

articles/virtual-machines/workloads/hpc/enable-infiniband.md

Lines changed: 29 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -11,72 +11,54 @@ tags: azure-resource-manager
1111
ms.service: virtual-machines
1212
ms.workload: infrastructure-services
1313
ms.topic: article
14-
ms.date: 05/15/2019
14+
ms.date: 10/17/2019
1515
ms.author: amverma
1616
---
1717

1818
# Enable InfiniBand with SR-IOV
1919

20-
The simplest and recommended way to get started with IaaS VMs for HPC is to use the CentOS-HPC 7.6 VM OS image. If using your custom VM image, the easiest way to configure it with InfiniBand (IB) is to add the InfiniBandDriverLinux or InfiniBandDriverWindows VM extension to your deployment.
21-
Learn how to use these VM extensions with [Linux](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-hpc#rdma-capable-instances) and [Windows](https://docs.microsoft.com/azure/virtual-machines/windows/sizes-hpc#rdma-capable-instances)
20+
The Azure NC, ND, and H-series of VMs are all backed by a dedicated InfiniBand network. All RDMA-enabled sizes are capable of leveraging that network using Intel MPI. Some VM series have expanded support for all MPI implementations and RDMA verbs through SR-IOV. RDMA capable VMs include [GPU optimized](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-gpu) and [High-performance compute (HPC)](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-hpc) VMs.
2221

23-
To manually configure InfiniBand on SR-IOV enabled VMs (currently HB and HC series), follow the steps below. These steps are for RHEL/CentOS only. For Ubuntu (16.04 and 18.04), and SLES (12 SP4 and 15), the inbox drivers work well.
22+
## Choose your installation path
2423

25-
## Manually install OFED
24+
To get started, the simplest option is to use a platform image pre-configured for InfiniBand, where available:
2625

27-
Install the latest MLNX_OFED drivers for ConnectX-5 from [Mellanox](https://www.mellanox.com/page/products_dyn?product_family=26).
26+
- **HPC IaaS VMs** – To get started with IaaS VMs for HPC, the simplest solution is to use the [CentOS-HPC 7.6 VM OS image](https://techcommunity.microsoft.com/t5/Azure-Compute/CentOS-HPC-VM-Image-for-SR-IOV-enabled-Azure-HPC-VMs/ba-p/665557), which is already configured with InfiniBand. Since this image is already configured with InfiniBand, you don't have to configure it manually. For compatible Windows versions, see [Windows RDMA-capable instances](https://docs.microsoft.com/azure/virtual-machines/windows/sizes-hpc#rdma-capable-instances).
2827

29-
For RHEL/CentOS (example below for 7.6):
28+
- **GPU IaaS VMs** – No platform images are currently pre-configured for GPU optimized VMs, except for [CentOS-HPC 7.6 VM OS image](https://techcommunity.microsoft.com/t5/Azure-Compute/CentOS-HPC-VM-Image-for-SR-IOV-enabled-Azure-HPC-VMs/ba-p/665557). To configure a custom image with InfiniBand, see [Manually install Mellanox OFED](#manually-install-mellanox-ofed).
3029

31-
```bash
32-
sudo yum install -y kernel-devel python-devel
33-
sudo yum install -y redhat-rpm-config rpm-build gcc-gfortran gcc-c++
34-
sudo yum install -y gtk2 atk cairo tcl tk createrepo
35-
wget --retry-connrefused --tries=3 --waitretry=5 http://content.mellanox.com/ofed/MLNX_OFED-4.5-1.0.1.0/MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
36-
tar zxvf MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
37-
sudo ./MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64/mlnxofedinstall --add-kernel-support
38-
```
39-
40-
For Windows, download and install the WinOF-2 drivers for ConnectX-5 from [Mellanox](https://www.mellanox.com/page/products_dyn?product_family=32&menu_section=34)
30+
If you're using a custom VM image or a [GPU optimized](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-gpu) VM, you should configure it with InfiniBand by adding the InfiniBandDriverLinux or InfiniBandDriverWindows VM extension to your deployment. Learn how to use these VM extensions with [Linux](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-hpc#rdma-capable-instances) and [Windows](https://docs.microsoft.com/azure/virtual-machines/windows/sizes-hpc#rdma-capable-instances).
4131

42-
## Enable IPoIB
32+
## Manually install Mellanox OFED
4333

44-
```bash
45-
sudo sed -i 's/LOAD_EIPOIB=no/LOAD_EIPOIB=yes/g' /etc/infiniband/openib.conf
46-
sudo /etc/init.d/openibd restart
47-
if [ $? -eq 1 ]
48-
then
49-
sudo modprobe -rv ib_isert rpcrdma ib_srpt
50-
sudo /etc/init.d/openibd restart
51-
fi
52-
```
34+
To manually configure InfiniBand with SR-IOV, use the following steps. The example in these steps shows syntax for RHEL/CentOS, but the steps are general and can be used for any compatible operating system such as Ubuntu (16.04, 18.04 19.04) and SLES (12 SP4 and 15). The inbox drivers work as well, but the Mellanox OpenFabrics drivers provide more features.
5335

54-
## Assign an IP address
36+
For more information on the supported distributions for the Mellanox driver, see the latest [Mellanox OpenFabrics drivers](https://www.mellanox.com/page/products_dyn?product_family=26). For more information on the Mellanox OpenFabrics driver, see the [Mellanox user guide](https://docs.mellanox.com/category/mlnxofedib).
5537

56-
Assign an IP address to the ib0 interface, using either:
38+
See the following example for how to configure InfiniBand on Linux:
5739

58-
- Manually assign IP Address to the ib0 Interface (as root).
40+
```bash
41+
# Modify the variable to desired Mellanox OFED version
42+
MOFED_VERSION=#4.7-1.0.0.1
43+
# Modify the variable to desired OS
44+
MOFED_OS=#rhel7.6
45+
pushd /tmp
46+
curl -fSsL https://www.mellanox.com/downloads/ofed/MLNX_OFED-${MOFED_VERSION}/MLNX_OFED_LINUX-${MOFED_VERSION}-${MOFED_OS}-x86_64.tgz | tar -zxpf -
47+
cd MLNX_OFED_LINUX-*
48+
sudo ./mlnxofedinstall
49+
popd
50+
```
5951

60-
```bash
61-
ifconfig ib0 $(sed '/rdmaIPv4Address=/!d;s/.*rdmaIPv4Address="\([0-9.]*\)".*/\1/' /var/lib/waagent/SharedConfig.xml)/16
62-
```
52+
For Windows, download and install the [Mellanox OFED for Windows drivers](https://www.mellanox.com/page/products_dyn?product_family=32&menu_section=34).
6353

64-
OR
54+
## Enable IP over InfiniBand
6555

66-
- Use WALinuxAgent to assign IP address and make it persist.
56+
Use the following commands to enable IP over InfiniBand.
6757

68-
```bash
69-
yum install -y epel-release
70-
yum install -y python-pip
71-
python -m pip install --upgrade pip setuptools wheel
72-
wget "https://github.com/Azure/WALinuxAgent/archive/release-2.2.36.zip"
73-
unzip release-2.2.36.zip
74-
cd WALinuxAgent*
75-
python setup.py install --register-service --force
76-
sed -i -e 's/# OS.EnableRDMA=y/OS.EnableRDMA=y/g' /etc/waagent.conf
77-
sed -i -e 's/# AutoUpdate.Enabled=y/AutoUpdate.Enabled=y/g' /etc/waagent.conf
78-
systemctl restart waagent
79-
```
58+
```bash
59+
sudo sed -i -e 's/# OS.EnableRDMA=y/OS.EnableRDMA=y/g' /etc/waagent.conf
60+
sudo systemctl restart waagent
61+
```
8062

8163
## Next steps
8264

0 commit comments

Comments
 (0)