Skip to content

Commit 32eeb98

Browse files
authored
Merge pull request #76454 from vermagit/patch-13
Plenty of changes around SR-IOV and IB
2 parents 0de408e + 25cccc0 commit 32eeb98

File tree

1 file changed

+39
-27
lines changed

1 file changed

+39
-27
lines changed

articles/virtual-machines/linux/sizes-hpc.md

Lines changed: 39 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -30,53 +30,68 @@ ms.author: jonbeck
3030

3131
### MPI
3232

33-
Only Intel MPI 5.x versions are supported.
33+
The SR-IOV enabled VM sizes on Azure allow almost any flavor of MPI to be used.
34+
On non-SR-IOV enabled VMs, only Intel MPI 5.x versions are supported. Later versions (2017, 2018) of the Intel MPI runtime library may or may not be compatible with the Azure Linux RDMA drivers.
3435

35-
> [!NOTE]
36-
> Later versions (2017, 2018) of the Intel MPI runtime library may or may not be compatible with the Azure Linux RDMA drivers.
3736

38-
### Distributions
37+
### Supported OS images
3938

40-
Deploy a compute-intensive VM from one of the images in the Azure Marketplace that supports RDMA connectivity:
39+
The Azure Marketplace has many Linux distributions that support RDMA connectivity:
4140

42-
* **Ubuntu** - Ubuntu Server 16.04 LTS. Configure RDMA drivers on the VM and register with Intel to download Intel MPI:
41+
* **CentOS-based HPC** - For non-SR-IOV enabled VMs, CentOS-based version 6.5 HPC or a later version, up to 7.5 are suitable. For H-series VMs, versions 7.1 to 7.5 are recommended. RDMA drivers and Intel MPI 5.1 are installed on the VM.
42+
For SR-IOV VMs, CentOS-HPC 7.6 comes optimized and pre-loaded with the RDMA drivers and various MPI packages installed.
43+
For other RHEL/CentOS VM images, add the InfiniBandLinux extension to enable InfiniBand. This Linux VM extension installs Mellanox OFED drivers (on SR-IOV VMs) for RDMA connectivity. The following PowerShell cmdlet installs the latest version (version 1.0) of the InfiniBandDriverLinux extension on an existing RDMA-capable VM. The RDMA-capable VM is named *myVM* and is deployed in the resource group named *myResourceGroup* in the *West US* region as follows:
4344

44-
[!INCLUDE [virtual-machines-common-ubuntu-rdma](../../../includes/virtual-machines-common-ubuntu-rdma.md)]
45+
```powershell
46+
Set-AzVMExtension -ResourceGroupName "myResourceGroup" -Location "westus" -VMName "myVM" -ExtensionName "InfiniBandDriverLinux" -Publisher "Microsoft.HpcCompute" -Type "InfiniBandDriverLinux" -TypeHandlerVersion "1.0"
47+
```
48+
Alternatively, VM extensions can be included in Azure Resource Manager templates for easy deployment with the following JSON element:
49+
```json
50+
"properties":{
51+
"publisher": "Microsoft.HpcCompute",
52+
"type": "InfiniBandDriverLinux",
53+
"typeHandlerVersion": "1.0",
54+
}
55+
```
56+
57+
> [!NOTE]
58+
> On the CentOS-based HPC images, kernel updates are disabled in the **yum** configuration file. This is because Linux RDMA drivers are distributed as an RPM package, and driver updates might not work if the kernel is updated.
59+
>
60+
4561

46-
* **SUSE Linux Enterprise Server** - SLES 12 SP3 for HPC, SLES 12 SP3 for HPC (Premium), SLES 12 SP1 for HPC, SLES 12 SP1 for HPC (Premium). RDMA drivers are installed and Intel MPI packages are distributed on the VM. Install MPI by running the following command:
62+
* **SUSE Linux Enterprise Server** - SLES 12 SP3 for HPC, SLES 12 SP3 for HPC (Premium), SLES 12 SP1 for HPC, SLES 12 SP1 for HPC (Premium), SLES 12 SP4 and SLES 15. RDMA drivers are installed and Intel MPI packages are distributed on the VM. Install MPI by running the following command:
4763

4864
```bash
4965
sudo rpm -v -i --nodeps /opt/intelMPI/intel_mpi_packages/*.rpm
5066
```
51-
52-
* **CentOS-based HPC** - CentOS-based 6.5 HPC or a later version (for H-series, version 7.1 or later is recommended). RDMA drivers and Intel MPI 5.1 are installed on the VM.
53-
54-
> [!NOTE]
55-
> On the CentOS-based HPC images, kernel updates are disabled in the **yum** configuration file. This is because the Linux RDMA drivers are distributed as an RPM package, and driver updates might not work if the kernel is updated.
56-
>
57-
67+
68+
* **Ubuntu** - Ubuntu Server 16.04 LTS, 18.04 LTS. Configure RDMA drivers on the VM and register with Intel to download Intel MPI:
69+
70+
[!INCLUDE [virtual-machines-common-ubuntu-rdma](../../../includes/virtual-machines-common-ubuntu-rdma.md)]
71+
72+
For more details on enabling InfiniBand, setting up MPI, see [Enable InfiniBand](https://docs.microsoft.com/azure/virtual-machines/workloads/hpc/enable-infiniband-with-sriov).
73+
74+
5875
### Cluster configuration options
5976

6077
Azure provides several options to create clusters of Linux HPC VMs that can communicate using the RDMA network, including:
6178

6279
* **Virtual machines** - Deploy the RDMA-capable HPC VMs in the same availability set (when you use the Azure Resource Manager deployment model). If you use the classic deployment model, deploy the VMs in the same cloud service.
6380

64-
* **Virtual machine scale sets** - In a VM scale set, ensure that you limit the deployment to a single placement group. For example, in a Resource Manager template, set the `singlePlacementGroup` property to `true`.
81+
* **Virtual machine scale sets** - In a virtual machine scale set, ensure that you limit the deployment to a single placement group. For example, in a Resource Manager template, set the `singlePlacementGroup` property to `true`.
6582

6683
* **Azure CycleCloud** - Create an HPC cluster in [Azure CycleCloud](/azure/cyclecloud/) to run MPI jobs on Linux nodes.
6784

6885
* **Azure Batch** - Create an [Azure Batch](/azure/batch/) pool to run MPI workloads on Linux compute nodes. For more information, see [Use RDMA-capable or GPU-enabled instances in Batch pools](../../batch/batch-pool-compute-intensive-sizes.md). Also see the [Batch Shipyard](https://github.com/Azure/batch-shipyard) project, for running container-based workloads on Batch.
6986

7087
* **Microsoft HPC Pack** - [HPC Pack](https://docs.microsoft.com/powershell/high-performance-computing/overview) supports several Linux distributions to run on compute nodes deployed in RDMA-capable Azure VMs, managed by a Windows Server head node. For an example deployment, see [Create HPC Pack Linux RDMA Cluster in Azure](https://docs.microsoft.com/powershell/high-performance-computing/hpcpack-linux-openfoam).
7188

72-
Depending on your choice of cluster management tool, additional system configuration may be needed to run MPI jobs. For example, on a cluster of VMs, you may need to establish trust among the cluster nodes by generating SSH keys or by establishing passwordless SSH trust.
73-
74-
### Network topology considerations
75-
* On RDMA-enabled Linux VMs in Azure, Eth1 is reserved for RDMA network traffic. Do not change any Eth1 settings or any information in the configuration file referring to this network. Eth0 is reserved for regular Azure network traffic.
76-
77-
* The RDMA network in Azure reserves the address space 172.16.0.0/16.
78-
7989

90+
### Network considerations
91+
* On non-SR-IOV, RDMA-enabled Linux VMs in Azure, eth1 is reserved for RDMA network traffic. Do not change any eth1 settings or any information in the configuration file referring to this network.
92+
* On SR-IOV enabled VMs (HB and HC-series), ib0 is reserved for RDMA network traffic.
93+
* The RDMA network in Azure reserves the address space 172.16.0.0/16. To run MPI applications on instances deployed in an Azure virtual network, make sure that the virtual network address space does not overlap the RDMA network.
94+
* Depending on your choice of cluster management tool, additional system configuration may be needed to run MPI jobs. For example, on a cluster of VMs, you may need to establish trust among the cluster nodes by generating SSH keys or by establishing passwordless SSH logins.
8095

8196

8297
## Other sizes
@@ -89,8 +104,5 @@ Depending on your choice of cluster management tool, additional system configura
89104

90105
## Next steps
91106

107+
- Learn more about how to setup, optimize and scale [HPC workloads](https://docs.microsoft.com/azure/virtual-machines/workloads/hpc) on Azure.
92108
- Learn more about how [Azure compute units (ACU)](acu.md) can help you compare compute performance across Azure SKUs.
93-
94-
95-
96-

0 commit comments

Comments
 (0)