Skip to content

Commit 09f264f

Browse files
authored
Merge pull request #73876 from laurenhughes/lahugh-hpc
Update TOC structure
2 parents f4fb984 + 62d76c0 commit 09f264f

17 files changed

+915
-2
lines changed

articles/virtual-machines/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
href: workloads/oracle/oracle-considerations.md
1313
- name: SAP
1414
href: workloads/sap/get-started.md
15+
- name: High performance computing
16+
href: workloads/hpc/configure.md
1517
- name: Mainframe rehosting
1618
href: workloads/mainframe-rehosting/overview.md
1719
- name: Classic deployments
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
title: Scaling HPC applications - Azure Virtual Machines | Microsoft Docs
3+
description: Learn how to scale HPC applications on Azure VMs.
4+
services: virtual-machines
5+
documentationcenter: ''
6+
author: vermagit
7+
manager: jeconnoc
8+
editor: ''
9+
tags: azure-resource-manager
10+
11+
ms.service: virtual-machines
12+
ms.workload: infrastructure-services
13+
ms.topic: article
14+
ms.date: 05/15/2019
15+
ms.author: amverma
16+
---
17+
18+
# Scaling HPC applications
19+
20+
Optimal scale-up and scale-out performance of HPC applications on Azure requires performance tuning and optimization experiments for the specific workload. This section and the VM series-specific pages offer general guidance for scaling your applications.
21+
22+
## Compiling applications
23+
24+
Though not necessary, compiling applications with appropriate optimization flags provides the best scale-up performance on HB and HC-series VMs.
25+
26+
### AMD Optimizing C/C++ Compiler
27+
28+
The AMD Optimizing C/C++ Compiler (AOCC) compiler system offers a high level of advanced optimizations, multi-threading, and processor support that includes global optimization, vectorization, inter-procedural analyses, loop transformations, and code generation. AOCC compiler binaries are suitable for Linux systems having GNU C Library (glibc) version 2.17 and above. The compiler suite consists of a C/C++ compiler (clang), a Fortran compiler (FLANG) and a Fortran front end to Clang (Dragon Egg).
29+
30+
### Clang
31+
32+
Clang is a C, C++, and Objective-C compiler handling preprocessing, parsing, optimization, code generation, assembly, and linking.
33+
Clang supports the `-march=znver1` flag to enable best code generation and tuning for AMD’s Zen based x86 architecture.
34+
35+
### FLANG
36+
37+
The FLANG compiler is a recent addition to the AOCC suite (added April 2018) and is currently in pre-release for developers to download and test. Based on Fortran 2008, AMD extends the GitHub version of FLANG (https://github.com/flangcompiler/flang). The FLANG compiler supports all Clang compiler options and an additional number of FLANG-specific compiler options.
38+
39+
### DragonEgg
40+
41+
DragonEgg is a gcc plugin that replaces GCC’s optimizers and code generators with those from the LLVM project. DragonEgg that comes with AOCC works with gcc-4.8.x, has been tested for x86-32/x86-64 targets and has been successfully used on various Linux platforms.
42+
43+
GFortran is the actual frontend for Fortran programs responsible for preprocessing, parsing, and semantic analysis generating the GCC GIMPLE intermediate representation (IR). DragonEgg is a GNU plugin, plugging into GFortran compilation flow. It implements the GNU plugin API. With the plugin architecture, DragonEgg becomes the compiler driver, driving the different phases of compilation. After following the download and installation instructions, Dragon Egg can be invoked using:
44+
45+
```bash
46+
$ gfortran [gFortran flags]
47+
-fplugin=/path/AOCC-1.2-Compiler/AOCC-1.2-
48+
FortranPlugin/dragonegg.so [plugin optimization flags]
49+
-c xyz.f90 $ clang -O3 -lgfortran -o xyz xyz.o $./xyz
50+
```
51+
52+
### PGI Compiler
53+
PGI Community Edition ver. 17 is confirmed to work with AMD EPYC. A PGI-compiled version of STREAM does deliver full memory bandwidth of the platform. The newer Community Edition 18.10 (Nov 2018) should likewise work well. Below is sample CLI to compiler optimally with the Intel Compiler:
54+
55+
```bash
56+
pgcc $(OPTIMIZATIONS_PGI) $(STACK) -DSTREAM_ARRAY_SIZE=800000000 stream.c -o stream.pgi
57+
```
58+
59+
### Intel Compiler
60+
Intel Compiler ver. 18 is confirmed to work with AMD EPYC. Below is sample CLI to compiler optimally with the Intel Compiler.
61+
62+
```bash
63+
icc -o stream.intel stream.c -DSTATIC -DSTREAM_ARRAY_SIZE=800000000 -mcmodel=large -shared-intel -Ofast –qopenmp
64+
```
65+
66+
### GCC Compiler
67+
For HPC, AMD recommends GCC compiler 7.3 or newer. Older versions, such as 4.8.5 included with RHEL/CentOS 7.4, are not recommended. GCC 7.3, and newer, will deliver significantly higher performance on HPL, HPCG, and DGEMM tests.
68+
69+
```bash
70+
gcc $(OPTIMIZATIONS) $(OMP) $(STACK) $(STREAM_PARAMETERS) stream.c -o stream.gcc
71+
```
72+
73+
## Scaling applications
74+
75+
The following suggestions apply for optimal application scaling efficiency, performance, and consistency:
76+
77+
* Pin processes to cores 0-59 using a sequential pinning approach (as opposed to an auto-balance approach).
78+
* Binding by Numa/Core/HwThread is better than default binding.
79+
* For hybrid parallel applications (OpenMP+MPI), use 4 threads and 1 MPI rank per CCX.
80+
* For pure MPI applications, experiment with 1-4 MPI ranks per CCX for optimal performance.
81+
* Some applications with extreme sensitivity to memory bandwidth may benefit from using a reduced number of cores per CCX. For these applications, using 3 or 2 cores per CCX may reduce memory bandwidth contention and yield higher real-world performance or more consistent scalability. In particular, MPI Allreduce may benefit from this.
82+
* For significantly larger scale runs, it is recommended to use UD or hybrid RC+UD transports. Many MPI libraries/runtime libraries do this internally (such as UCX or MVAPICH2). Check your transport configurations for large-scale runs.
83+
84+
## Next steps
85+
86+
Learn more about [HPC](https://docs.microsoft.com/azure/architecture/topics/high-performance-computing/) on Azure.
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: High Performance Computing - Azure Virtual Machines | Microsoft Docs
3+
description: Learn about High Performance Computing on Azure.
4+
services: virtual-machines
5+
documentationcenter: ''
6+
author: vermagit
7+
manager: jeconnoc
8+
editor: ''
9+
tags: azure-resource-manager
10+
11+
ms.service: virtual-machines
12+
ms.workload: infrastructure-services
13+
ms.topic: article
14+
ms.date: 05/07/2019
15+
ms.author: amverma
16+
---
17+
18+
# Optimization for Linux
19+
20+
This article shows a few key techniques to optimize your OS image. Learn more about [enabling InfiniBand](enable-infiniband.md) and optimizing the OS images.
21+
22+
## Update LIS
23+
24+
If deploying using a custom image (for example, an older OS such as CentOS/RHEL 7.4 or 7.5), update LIS on the VM.
25+
26+
```bash
27+
wget https://aka.ms/lis
28+
tar xzf lis
29+
pushd LISISO
30+
./upgrade.sh
31+
```
32+
33+
## Reclaim memory
34+
35+
Improve efficiency by automatically reclaiming memory to avoid remote memory access.
36+
37+
```bash
38+
echo 1 >/proc/sys/vm/zone_reclaim_mode
39+
```
40+
41+
To make this persist after VM reboots:
42+
43+
```bash
44+
echo "vm.zone_reclaim_mode = 1" >> /etc/sysctl.conf sysctl -p
45+
```
46+
47+
## Disable firewall and SELinux
48+
49+
Disable firewall and SELinux.
50+
51+
```bash
52+
systemctl stop iptables.service
53+
systemctl disable iptables.service
54+
systemctl mask firewalld
55+
systemctl stop firewalld.service
56+
systemctl disable firewalld.service
57+
iptables -nL
58+
sed -i -e's/SELINUX=enforcing/SELINUX=disabled/g'/etc/selinux/config
59+
```
60+
61+
## Disable cpupower
62+
63+
Disable cpupower.
64+
65+
```bash
66+
service cpupower status
67+
if enabled, disable it:
68+
service cpupower stop
69+
sudo systemctl disable cpupower
70+
```
71+
72+
## Next steps
73+
74+
* Learn more about [enabling InfiniBand](enable-infiniband.md) and optimizing OS images.
75+
76+
* Learn more about [HPC](https://docs.microsoft.com/azure/architecture/topics/high-performance-computing/) on Azure.
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: Enable InifinBand with SR-IOV - Azure Virtual Machines | Microsoft Docs
3+
description: Learn how to enable InfiniBand with SR-IOV.
4+
services: virtual-machines
5+
documentationcenter: ''
6+
author: vermagit
7+
manager: jeconnoc
8+
editor: ''
9+
tags: azure-resource-manager
10+
11+
ms.service: virtual-machines
12+
ms.workload: infrastructure-services
13+
ms.topic: article
14+
ms.date: 05/15/2019
15+
ms.author: amverma
16+
---
17+
18+
19+
# Enable InfiniBand with SR-IOV
20+
21+
22+
The simplest and recommended way to configure your custom VM image with InfiniBand (IB) is to add the InfiniBandDriverLinux or InfiniBandDriverWindows VM extension to your deployment.
23+
Learn how to use these VM extensions with [Linux](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-hpc#rdma-capable-instances) and [Windows](https://docs.microsoft.com/azure/virtual-machines/windows/sizes-hpc#rdma-capable-instances)
24+
25+
To manually configure InfiniBand on SR-IOV enabled VMs (currently HB and HC series), follow the steps below. These steps are for RHEL/CentOS only. For Ubuntu (16.04 and 18.04), and SLES (12 SP4 and 15), the inbox drivers work well. For Ubuntu,
26+
27+
28+
## Manually install OFED
29+
30+
Install the latest MLNX_OFED drivers for ConnectX-5 from [Mellanox](http://www.mellanox.com/page/products_dyn?product_family=26).
31+
32+
For RHEL/CentOS (example below for 7.6):
33+
```bash
34+
sudo yum install -y kernel-devel python-devel
35+
sudo yum install -y redhat-rpm-config rpm-build gcc-gfortran gcc-c++
36+
sudo yum install -y gtk2 atk cairo tcl tk createrepo
37+
wget --retry-connrefused --tries=3 --waitretry=5 http://content.mellanox.com/ofed/MLNX_OFED-4.5-1.0.1.0/MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
38+
tar zxvf MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
39+
sudo ./MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64/mlnxofedinstall --add-kernel-support
40+
```
41+
42+
For Windows, download and install the WinOF-2 drivers for ConnectX-5 from [Mellanox](http://www.mellanox.com/page/products_dyn?product_family=32&menu_section=34)
43+
44+
## Assign an IP address
45+
46+
Assign an IP address to the ib0 interface, using either:
47+
48+
- Manually assign IP Address to the ib0 Interface (as root).
49+
50+
```bash
51+
ifconfig ib0 $(sed '/rdmaIPv4Address=/!d;s/.*rdmaIPv4Address="\([0-9.]*\)".*/\1/' /var/lib/waagent/SharedConfig.xml)/16
52+
```
53+
54+
OR
55+
56+
- Use WALinuxAgent to assign IP address and make it persist.
57+
58+
```bash
59+
yum install -y epel-release
60+
yum install -y python-pip
61+
python -m pip install --upgrade pip setuptools wheel
62+
wget "https://github.com/Azure/WALinuxAgent/archive/release-2.2.36.zip"
63+
unzip release-2.2.36.zip
64+
cd WALinuxAgent*
65+
python setup.py install --register-service --force
66+
sed -i -e 's/# OS.EnableRDMA=y/OS.EnableRDMA=y/g' /etc/waagent.conf
67+
sed -i -e 's/# AutoUpdate.Enabled=y/AutoUpdate.Enabled=y/g' /etc/waagent.conf
68+
systemctl restart waagent
69+
```
70+
71+
## Next steps
72+
73+
Learn more about [HPC](https://docs.microsoft.com/azure/architecture/topics/high-performance-computing/) on Azure.
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
title: Known issues with HB-series and HC-series VMs - Azure Virtual Machines | Microsoft Docs
3+
description: Learn about known issues with HB-series VM sizes in Azure.
4+
services: virtual-machines
5+
documentationcenter: ''
6+
author: vermagit
7+
manager: jeconnoc
8+
editor: ''
9+
tags: azure-resource-manager
10+
11+
ms.service: virtual-machines
12+
ms.workload: infrastructure-services
13+
ms.topic: article
14+
ms.date: 05/07/2019
15+
ms.author: amverma
16+
---
17+
18+
# Known issues with HB-series and HC-series VMs
19+
20+
This article provides the most common issues and solutions when using HB-series and HC-series VMs.
21+
22+
## DRAM on HB-series
23+
24+
HB-series VMs can only expose 228 GB of RAM to guest VMs at this time. This is due to a known limitation of Azure hypervisor to prevent pages from being assigned to the local DRAM of AMD CCX’s (NUMA domains) reserved for the guest VM.
25+
26+
## Accelerated Networking
27+
28+
Azure Accelerated Networking is not enabled at this time, but will as we progress through the Preview period. We will notify customers when this feature is supported.
29+
30+
## UD Transport
31+
32+
At launch, HB-series does not support Dynamically Connected Transport (DCT). Support for DCT will be implemented over time. Reliable Connection (RC) and Unreliable Datagram (UD) transports are supported.
33+
34+
## Azure Batch
35+
36+
While HB-series VMs are in preview, use a Batch account in User Subscription mode not in Service mode.
37+
38+
## GSS Proxy
39+
40+
GSS Proxy has a known bug in CentOS/RHEL 7.5 that can manifest as a significant performance and responsiveness penalty when used with NFS. This can be mitigated with:
41+
42+
```console
43+
sed -i 's/GSS_USE_PROXY="yes"/GSS_USE_PROXY="no"/g' /etc/sysconfig/nfs
44+
```
45+
46+
## Cache Cleaning
47+
48+
On HPC systems, it is often useful to clean up the memory after a job has finished before the next user is assigned the same node. After running applications in Linux you may find that your available memory reduces while your buffer memory increases, despite not running any applications.
49+
50+
![Screenshot of command prompt](./media/known-issues/cache-cleaning-1.png)
51+
52+
Using `numactl -H` will show which NUMAnode(s) the memory is buffered with (possibly all). In Linux, users can clean the caches in three ways to return buffered or cached memory to ‘free’. You need to be root or have sudo permissions.
53+
54+
```console
55+
echo 1 > /proc/sys/vm/drop_caches [frees page-cache]
56+
echo 2 > /proc/sys/vm/drop_caches [frees slab objects e.g. dentries, inodes]
57+
echo 3 > /proc/sys/vm/drop_caches [cleans page-cache and slab objects]
58+
```
59+
60+
![Screenshot of command prompt](./media/known-issues/cache-cleaning-2.png)
61+
62+
## Kernel warnings
63+
64+
You may see the following kernel warning messages when booting a HB-series VM under Linux.
65+
66+
```console
67+
[ 0.004000] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:376 topology_sane.isra.3+0x80/0x90
68+
[ 0.004000] sched: CPU #4's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
69+
[ 0.004000] Modules linked in:
70+
[ 0.004000] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.10.0-957.el7.x86_64 #1
71+
[ 0.004000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 05/18/2018
72+
[ 0.004000] Call Trace:
73+
[ 0.004000] [<ffffffffb8361dc1>] dump_stack+0x19/0x1b
74+
[ 0.004000] [<ffffffffb7c97648>] __warn+0xd8/0x100
75+
[ 0.004000] [<ffffffffb7c976cf>] warn_slowpath_fmt+0x5f/0x80
76+
[ 0.004000] [<ffffffffb7c02b34>] ? calibrate_delay+0x3e4/0x8b0
77+
[ 0.004000] [<ffffffffb7c574c0>] topology_sane.isra.3+0x80/0x90
78+
[ 0.004000] [<ffffffffb7c57782>] set_cpu_sibling_map+0x172/0x5b0
79+
[ 0.004000] [<ffffffffb7c57ce1>] start_secondary+0x121/0x270
80+
[ 0.004000] [<ffffffffb7c000d5>] start_cpu+0x5/0x14
81+
[ 0.004000] ---[ end trace 73fc0e0825d4ca1f ]---
82+
```
83+
84+
You can ignore this warning. This is due to a known limitation of the Azure hypervisor that will be addressed over time.
85+
86+
## Next steps
87+
88+
Learn more about [high-performance computing](https://docs.microsoft.com/azure/architecture/topics/high-performance-computing/) in Azure.

0 commit comments

Comments
 (0)