MicrosoftDocs
diff --git a/‎articles/virtual-machines/TOC.yml
Lines changed: 2 additions & 0 deletions b/‎articles/virtual-machines/TOC.yml
Lines changed: 2 additions & 0 deletions
diff --git a/‎articles/virtual-machines/workloads/hpc/compiling-scaling-applications.md
Lines changed: 86 additions & 0 deletions b/‎articles/virtual-machines/workloads/hpc/compiling-scaling-applications.md
Lines changed: 86 additions & 0 deletions
diff --git a/‎articles/virtual-machines/workloads/hpc/configure.md
Lines changed: 76 additions & 0 deletions b/‎articles/virtual-machines/workloads/hpc/configure.md
Lines changed: 76 additions & 0 deletions
diff --git a/‎articles/virtual-machines/workloads/hpc/enable-infiniband.md
Lines changed: 73 additions & 0 deletions b/‎articles/virtual-machines/workloads/hpc/enable-infiniband.md
Lines changed: 73 additions & 0 deletions
diff --git a/‎articles/virtual-machines/workloads/hpc/hb-hc-known-issues.md
Lines changed: 88 additions & 0 deletions b/‎articles/virtual-machines/workloads/hpc/hb-hc-known-issues.md
Lines changed: 88 additions & 0 deletions
@@ -12,6 +12,8 @@
     href: workloads/oracle/oracle-considerations.md
   - name: SAP
     href: workloads/sap/get-started.md
+  - name: High performance computing
+    href: workloads/hpc/configure.md
   - name: Mainframe rehosting
     href: workloads/mainframe-rehosting/overview.md
 - name: Classic deployments
 
@@ -0,0 +1,86 @@
+---
+title: Scaling HPC applications - Azure Virtual Machines | Microsoft Docs
+description: Learn how to scale HPC applications on Azure VMs. 
+services: virtual-machines
+documentationcenter: ''
+author: vermagit
+manager: jeconnoc
+editor: ''
+tags: azure-resource-manager
+
+ms.service: virtual-machines
+ms.workload: infrastructure-services
+ms.topic: article
+ms.date: 05/15/2019
+ms.author: amverma
+---
+
+# Scaling HPC applications
+
+Optimal scale-up and scale-out performance of HPC applications on Azure requires performance tuning and optimization experiments for the specific workload. This section and the VM series-specific pages offer general guidance for scaling your applications.
+
+## Compiling applications
+
+Though not necessary, compiling applications with appropriate optimization flags provides the best scale-up performance on HB and HC-series VMs.
+
+### AMD Optimizing C/C++ Compiler
+
+The AMD Optimizing C/C++ Compiler (AOCC) compiler system offers a high level of advanced optimizations, multi-threading, and processor support that includes global optimization, vectorization, inter-procedural analyses, loop transformations, and code generation. AOCC compiler binaries are suitable for Linux systems having GNU C Library (glibc) version 2.17 and above. The compiler suite consists of a C/C++ compiler (clang), a Fortran compiler (FLANG) and a Fortran front end to Clang (Dragon Egg).
+
+### Clang
+
+Clang is a C, C++, and Objective-C compiler handling preprocessing, parsing, optimization, code generation, assembly, and linking. 
+Clang supports the  `-march=znver1` flag to enable best code generation and tuning for AMD’s Zen based x86 architecture.
+
+### FLANG
+
+The FLANG compiler is a recent addition to the AOCC suite (added April 2018) and is currently in pre-release for developers to download and test. Based on Fortran 2008, AMD extends the GitHub version of FLANG (https://github.com/flangcompiler/flang). The FLANG compiler supports all Clang compiler options and an additional number of FLANG-specific compiler options.
+
+### DragonEgg
+
+DragonEgg is a gcc plugin that replaces GCC’s optimizers and code generators with those from the LLVM project. DragonEgg that comes with AOCC works with gcc-4.8.x, has been tested for x86-32/x86-64 targets and has been successfully used on various Linux platforms.
+
+GFortran is the actual frontend for Fortran programs responsible for preprocessing, parsing, and semantic analysis generating the GCC GIMPLE intermediate representation (IR). DragonEgg is a GNU plugin, plugging into GFortran compilation flow. It implements the GNU plugin API. With the plugin architecture, DragonEgg becomes the compiler driver, driving the different phases of compilation.  After following the download and installation instructions, Dragon Egg can be invoked using: 
+
+```bash
+$ gfortran [gFortran flags] 
+   -fplugin=/path/AOCC-1.2-Compiler/AOCC-1.2-     
+   FortranPlugin/dragonegg.so [plugin optimization flags]     
+   -c xyz.f90 $ clang -O3 -lgfortran -o xyz xyz.o $./xyz
+```
+   
+### PGI Compiler
+PGI Community Edition ver. 17 is confirmed to work with AMD EPYC. A PGI-compiled version of STREAM does deliver full memory bandwidth of the platform. The newer Community Edition 18.10 (Nov 2018) should likewise work well. Below is sample CLI to compiler optimally with the Intel Compiler:
+
+```bash
+pgcc $(OPTIMIZATIONS_PGI) $(STACK) -DSTREAM_ARRAY_SIZE=800000000 stream.c -o stream.pgi
+```
+
+### Intel Compiler
+Intel Compiler ver. 18 is confirmed to work with AMD EPYC. Below is sample CLI to compiler optimally with the Intel Compiler.
+
+```bash
+icc -o stream.intel stream.c -DSTATIC -DSTREAM_ARRAY_SIZE=800000000 -mcmodel=large -shared-intel -Ofast –qopenmp
+```
+
+### GCC Compiler 
+For HPC, AMD recommends GCC compiler 7.3 or newer. Older versions, such as 4.8.5 included with RHEL/CentOS 7.4, are not recommended. GCC 7.3, and newer, will deliver significantly higher performance on HPL, HPCG, and DGEMM tests.
+
+```bash
+gcc $(OPTIMIZATIONS) $(OMP) $(STACK) $(STREAM_PARAMETERS) stream.c -o stream.gcc
+```
+
+## Scaling applications 
+
+The following suggestions apply for optimal application scaling efficiency, performance, and consistency:
+
+* Pin processes to cores 0-59 using a sequential pinning approach (as opposed to an auto-balance approach). 
+* Binding by Numa/Core/HwThread is better than default binding.
+* For hybrid parallel applications (OpenMP+MPI), use 4 threads and 1 MPI rank per CCX.
+* For pure MPI applications, experiment with 1-4 MPI ranks per CCX for optimal performance.
+* Some applications with extreme sensitivity to memory bandwidth may benefit from using a reduced number of cores per CCX. For these applications, using 3 or 2 cores per CCX may reduce memory bandwidth contention and yield higher real-world performance or more consistent scalability. In particular, MPI Allreduce may benefit from this.
+* For significantly larger scale runs, it is recommended to use UD or hybrid RC+UD transports. Many MPI libraries/runtime libraries do this internally (such as UCX or MVAPICH2). Check your transport configurations for large-scale runs.
+
+## Next steps
+
+Learn more about [HPC](https://docs.microsoft.com/azure/architecture/topics/high-performance-computing/) on Azure.
@@ -0,0 +1,76 @@
+---
+title: High Performance Computing - Azure Virtual Machines | Microsoft Docs
+description: Learn about High Performance Computing on Azure.
+services: virtual-machines
+documentationcenter: ''
+author: vermagit
+manager: jeconnoc
+editor: ''
+tags: azure-resource-manager
+
+ms.service: virtual-machines
+ms.workload: infrastructure-services
+ms.topic: article
+ms.date: 05/07/2019
+ms.author: amverma
+---
+
+# Optimization for Linux
+
+This article shows a few key techniques to optimize your OS image. Learn more about [enabling InfiniBand](enable-infiniband.md) and optimizing the OS images.
+
+## Update LIS
+
+If deploying using a custom image (for example, an older OS such as CentOS/RHEL 7.4 or 7.5), update LIS on the VM.
+
+```bash
+wget https://aka.ms/lis
+tar xzf lis
+pushd LISISO
+./upgrade.sh
+```
+
+## Reclaim memory
+
+Improve efficiency by automatically reclaiming memory to avoid remote memory access.
+
+```bash
+echo 1 >/proc/sys/vm/zone_reclaim_mode
+```
+
+To make this persist after VM reboots:
+
+```bash
+echo "vm.zone_reclaim_mode = 1" >> /etc/sysctl.conf sysctl -p
+```
+
+## Disable firewall and SELinux
+
+Disable firewall and SELinux.
+
+```bash
+systemctl stop iptables.service
+systemctl disable iptables.service
+systemctl mask firewalld
+systemctl stop firewalld.service
+systemctl disable firewalld.service
+iptables -nL
+sed -i -e's/SELINUX=enforcing/SELINUX=disabled/g'/etc/selinux/config
+```
+
+## Disable cpupower
+
+Disable cpupower.
+
+```bash
+service cpupower status
+if enabled, disable it:
+service cpupower stop
+sudo systemctl disable cpupower
+```
+
+## Next steps
+
+* Learn more about [enabling InfiniBand](enable-infiniband.md) and optimizing OS images.
+
+* Learn more about [HPC](https://docs.microsoft.com/azure/architecture/topics/high-performance-computing/) on Azure.
@@ -0,0 +1,73 @@
+---
+title: Enable InifinBand with SR-IOV - Azure Virtual Machines | Microsoft Docs
+description: Learn how to enable InfiniBand with SR-IOV. 
+services: virtual-machines
+documentationcenter: ''
+author: vermagit
+manager: jeconnoc
+editor: ''
+tags: azure-resource-manager
+
+ms.service: virtual-machines
+ms.workload: infrastructure-services
+ms.topic: article
+ms.date: 05/15/2019
+ms.author: amverma
+---
+
+
+# Enable InfiniBand with SR-IOV
+
+
+The simplest and recommended way to configure your custom VM image with InfiniBand (IB) is to add the InfiniBandDriverLinux or InfiniBandDriverWindows VM extension to your deployment.
+Learn how to use these VM extensions with [Linux](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-hpc#rdma-capable-instances) and [Windows](https://docs.microsoft.com/azure/virtual-machines/windows/sizes-hpc#rdma-capable-instances)
+
+To manually configure InfiniBand on SR-IOV enabled VMs (currently HB and HC series), follow the steps below. These steps are for RHEL/CentOS only. For Ubuntu (16.04 and 18.04), and SLES (12 SP4 and 15), the inbox drivers work well. For Ubuntu, 
+
+
+## Manually install OFED
+
+Install the latest MLNX_OFED drivers for ConnectX-5 from [Mellanox](http://www.mellanox.com/page/products_dyn?product_family=26).
+
+For RHEL/CentOS (example below for 7.6):
+```bash
+sudo yum install -y kernel-devel python-devel
+sudo yum install -y redhat-rpm-config rpm-build gcc-gfortran gcc-c++
+sudo yum install -y gtk2 atk cairo tcl tk createrepo
+wget --retry-connrefused --tries=3 --waitretry=5 http://content.mellanox.com/ofed/MLNX_OFED-4.5-1.0.1.0/MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
+tar zxvf MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
+sudo ./MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64/mlnxofedinstall --add-kernel-support
+```
+
+For Windows, download and install the WinOF-2 drivers for ConnectX-5 from [Mellanox](http://www.mellanox.com/page/products_dyn?product_family=32&menu_section=34)
+
+## Assign an IP address
+
+Assign an IP address to the ib0 interface, using either:
+
+- Manually assign IP Address to the ib0 Interface (as root).
+
+	```bash
+	ifconfig ib0 $(sed '/rdmaIPv4Address=/!d;s/.*rdmaIPv4Address="\([0-9.]*\)".*/\1/' /var/lib/waagent/SharedConfig.xml)/16
+	```
+
+OR
+
+- Use WALinuxAgent to assign IP address and make it persist.
+
+	```bash
+	yum install -y epel-release
+	yum install -y python-pip
+	python -m pip install --upgrade pip setuptools wheel
+	wget "https://github.com/Azure/WALinuxAgent/archive/release-2.2.36.zip"
+	unzip release-2.2.36.zip
+	cd WALinuxAgent*
+	python setup.py install --register-service --force
+	sed -i -e 's/# OS.EnableRDMA=y/OS.EnableRDMA=y/g' /etc/waagent.conf
+	sed -i -e 's/# AutoUpdate.Enabled=y/AutoUpdate.Enabled=y/g' /etc/waagent.conf
+	systemctl restart waagent
+	```
+
+## Next steps
+
+Learn more about [HPC](https://docs.microsoft.com/azure/architecture/topics/high-performance-computing/) on Azure.
@@ -0,0 +1,88 @@
+---
+title: Known issues with HB-series and HC-series VMs - Azure Virtual Machines | Microsoft Docs
+description: Learn about known issues with HB-series VM sizes in Azure. 
+services: virtual-machines
+documentationcenter: ''
+author: vermagit
+manager: jeconnoc
+editor: ''
+tags: azure-resource-manager
+
+ms.service: virtual-machines
+ms.workload: infrastructure-services
+ms.topic: article
+ms.date: 05/07/2019
+ms.author: amverma
+---
+
+# Known issues with HB-series and HC-series VMs
+
+This article provides the most common issues and solutions when using HB-series and HC-series VMs.
+
+## DRAM on HB-series
+
+HB-series VMs can only expose 228 GB of RAM to guest VMs at this time. This is due to a known limitation of Azure hypervisor to prevent pages from being assigned to the local DRAM of AMD CCX’s (NUMA domains) reserved for the guest VM.
+
+## Accelerated Networking
+
+Azure Accelerated Networking is not enabled at this time, but will as we progress through the Preview period. We will notify customers when this feature is supported.
+
+## UD Transport
+
+At launch, HB-series does not support Dynamically Connected Transport (DCT). Support for DCT will be implemented over time. Reliable Connection (RC) and Unreliable Datagram (UD) transports are supported.
+
+## Azure Batch
+
+While HB-series VMs are in preview, use a Batch account in User Subscription mode not in Service mode.
+
+## GSS Proxy
+
+GSS Proxy has a known bug in CentOS/RHEL 7.5 that can manifest as a significant performance and responsiveness penalty when used with NFS. This can be mitigated with:
+
+```console
+sed -i 's/GSS_USE_PROXY="yes"/GSS_USE_PROXY="no"/g' /etc/sysconfig/nfs
+```
+
+## Cache Cleaning
+
+On HPC systems, it is often useful to clean up the memory after a job has finished before the next user is assigned the same node. After running applications in Linux you may find that your available memory reduces while your buffer memory increases, despite not running any applications.
+
+![Screenshot of command prompt](./media/known-issues/cache-cleaning-1.png)
+
+Using `numactl -H` will show which NUMAnode(s) the memory is buffered with (possibly all). In Linux, users can clean the caches in three ways to return buffered or cached memory to ‘free’. You need to be root or have sudo permissions.
+
+```console
+echo 1 > /proc/sys/vm/drop_caches [frees page-cache]
+echo 2 > /proc/sys/vm/drop_caches [frees slab objects e.g. dentries, inodes]
+echo 3 > /proc/sys/vm/drop_caches [cleans page-cache and slab objects]
+```
+
+![Screenshot of command prompt](./media/known-issues/cache-cleaning-2.png)
+
+## Kernel warnings
+
+You may see the following kernel warning messages when booting a HB-series VM under Linux.
+
+```console
+[  0.004000] WARNING: CPU: 4 PID: 0 at arch/x86/kernel/smpboot.c:376 topology_sane.isra.3+0x80/0x90
+[  0.004000] sched: CPU #4's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
+[  0.004000] Modules linked in:
+[  0.004000] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.10.0-957.el7.x86_64 #1
+[  0.004000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 05/18/2018
+[  0.004000] Call Trace:
+[  0.004000] [<ffffffffb8361dc1>] dump_stack+0x19/0x1b
+[  0.004000] [<ffffffffb7c97648>] __warn+0xd8/0x100
+[  0.004000] [<ffffffffb7c976cf>] warn_slowpath_fmt+0x5f/0x80
+[  0.004000] [<ffffffffb7c02b34>] ? calibrate_delay+0x3e4/0x8b0
+[  0.004000] [<ffffffffb7c574c0>] topology_sane.isra.3+0x80/0x90
+[  0.004000] [<ffffffffb7c57782>] set_cpu_sibling_map+0x172/0x5b0
+[  0.004000] [<ffffffffb7c57ce1>] start_secondary+0x121/0x270
+[  0.004000] [<ffffffffb7c000d5>] start_cpu+0x5/0x14
+[  0.004000] ---[ end trace 73fc0e0825d4ca1f ]---
+```
+
+You can ignore this warning. This is due to a known limitation of the Azure hypervisor that will be addressed over time.
+
+## Next steps
+
+Learn more about [high-performance computing](https://docs.microsoft.com/azure/architecture/topics/high-performance-computing/) in Azure.