Skip to content

Commit 5e818f6

Browse files
[Doc-a-thon] Updating compiling-scaling-applications.md
Modifying the bash command to remove the $ sign from the command, fixing bash code blocks with wrong format (printing bash as part of the command)
1 parent 0c8f3bf commit 5e818f6

File tree

1 file changed

+20
-15
lines changed

1 file changed

+20
-15
lines changed

articles/virtual-machines/compiling-scaling-applications.md

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to scale HPC applications on Azure VMs.
44
ms.service: virtual-machines
55
ms.subservice: hpc
66
ms.topic: article
7-
ms.date: 03/28/2023
7+
ms.date: 04/11/2023
88
ms.reviewer: cynthn, mattmcinnes
99
ms.author: mamccrea
1010
author: mamccrea
@@ -17,23 +17,25 @@ author: mamccrea
1717
Optimal scale-up and scale-out performance of HPC applications on Azure requires performance tuning and optimization experiments for the specific workload. This section and the VM series-specific pages offer general guidance for scaling your applications.
1818

1919
## Application setup
20+
2021
The [azurehpc repo](https://github.com/Azure/azurehpc) contains many examples of:
22+
2123
- Setting up and running [applications](https://github.com/Azure/azurehpc/tree/master/apps) optimally.
2224
- Configuration of [file systems, and clusters](https://github.com/Azure/azurehpc/tree/master/examples).
2325
- [Tutorials](https://github.com/Azure/azurehpc/tree/master/tutorials) on how to get started easily with some common application workflows.
2426

25-
## Optimally scaling MPI
27+
## Optimally scaling MPI
2628

2729
The following suggestions apply for optimal application scaling efficiency, performance, and consistency:
2830

29-
- For smaller scale jobs (< 256K connections) use:
30-
```bash UCX_TLS=rc,sm ```
31-
- For larger scale jobs (> 256K connections) use:
32-
```bash UCX_TLS=dc,sm ```
33-
- To calculate the number of connections for your MPI job, use:
34-
```bash Max Connections = (processes per node) x (number of nodes per job) x (number of nodes per job) ```
35-
31+
- For smaller scale jobs (< 256K connections) use: `UCX_TLS=rc,sm`
32+
33+
- For larger scale jobs (> 256K connections) use: `UCX_TLS=dc,sm`
34+
35+
- To calculate the number of connections for your MPI job, use: `Max Connections = (processes per node) x (number of nodes per job) x (number of nodes per job)`
36+
3637
## Adaptive Routing
38+
3739
Adaptive Routing (AR) allows Azure Virtual Machines (VMs) running EDR and HDR InfiniBand to automatically detect and avoid network congestion by dynamically selecting optimal network paths. As a result, AR offers improved latency and bandwidth on the InfiniBand network, which in turn drives higher performance and scaling efficiency. For more information, see [TechCommunity article](https://techcommunity.microsoft.com/t5/azure-compute/adaptive-routing-on-azure-hpc/ba-p/1205217).
3840

3941
## Process pinning
@@ -44,7 +46,7 @@ Adaptive Routing (AR) allows Azure Virtual Machines (VMs) running EDR and HDR In
4446
- For pure MPI applications, experiment with between one to four MPI ranks per CCX for optimal performance on HB and HBv2 VM sizes.
4547
- Some applications with extreme sensitivity to memory bandwidth may benefit from using a reduced number of cores per CCX. For these applications, using three or two cores per CCX may reduce memory bandwidth contention and yield higher real-world performance or more consistent scalability. In particular, MPI 'Allreduce' may benefit from this approach.
4648
- For larger scale runs, it's recommended to use UD or hybrid RC+UD transports. Many MPI libraries/runtime libraries use these transports internally (such as UCX or MVAPICH2). Check your transport configurations for large-scale runs.
47-
49+
4850
## Compiling applications
4951
<br>
5052
<details>
@@ -63,7 +65,7 @@ Clang supports the `-march=znver1` flag to enable best code generation and tuni
6365

6466
### FLANG
6567

66-
The FLANG compiler is a recent addition to the AOCC suite (added April 2018) and is currently in prerelease for developers to download and test. Based on Fortran 2008, AMD extends the GitHub version of FLANG (https://github.com/flang-compiler/flang). The FLANG compiler supports all Clang compiler options and other number of FLANG-specific compiler options.
68+
The FLANG compiler is a recent addition to the AOCC suite (added April 2018) and is currently in prerelease for developers to download and test. Based on Fortran 2008, AMD extends the GitHub version of [FLANG](https://github.com/flang-compiler/flang). The FLANG compiler supports all Clang compiler options and other number of FLANG-specific compiler options.
6769

6870
### DragonEgg
6971

@@ -72,32 +74,36 @@ DragonEgg is a gcc plugin that replaces GCC’s optimizers and code generators f
7274
GFortran is the actual frontend for Fortran programs responsible for preprocessing, parsing, and semantic analysis generating the GCC GIMPLE intermediate representation (IR). DragonEgg is a GNU plugin, plugging into GFortran compilation flow. It implements the GNU plugin API. With the plugin architecture, DragonEgg becomes the compiler driver, driving the different phases of compilation. After following the download and installation instructions, Dragon Egg can be invoked using:
7375

7476
```bash
75-
$ gfortran [gFortran flags]
77+
gfortran [gFortran flags]
7678
-fplugin=/path/AOCC-1.2-Compiler/AOCC-1.2-
7779
FortranPlugin/dragonegg.so [plugin optimization flags]
7880
-c xyz.f90 $ clang -O3 -lgfortran -o xyz xyz.o $./xyz
7981
```
82+
8083
### PGI Compiler
81-
PGI Community Edition 17 is confirmed to work with AMD EPYC. A PGI-compiled version of STREAM does deliver full memory bandwidth of the platform. The newer Community Edition 18.10 (Nov 2018) should likewise work well. Use this CLI command to compile with the Intel Compiler:
8284

85+
PGI Community Edition 17 is confirmed to work with AMD EPYC. A PGI-compiled version of STREAM does deliver full memory bandwidth of the platform. The newer Community Edition 18.10 (Nov 2018) should likewise work well. Use this CLI command to compile with the Intel Compiler:
8386

8487
```bash
8588
pgcc $(OPTIMIZATIONS_PGI) $(STACK) -DSTREAM_ARRAY_SIZE=800000000 stream.c -o stream.pgi
8689
```
8790

8891
### Intel Compiler
92+
8993
Intel Compiler 18 is confirmed to work with AMD EPYC. Use this CLI command to compile with the Intel Compiler.
9094

9195
```bash
9296
icc -o stream.intel stream.c -DSTATIC -DSTREAM_ARRAY_SIZE=800000000 -mcmodel=large -shared-intel -Ofast –qopenmp
9397
```
9498

95-
### GCC Compiler
99+
### GCC Compiler
100+
96101
For HPC workloads, AMD recommends GCC compiler 7.3 or newer. Older versions, such as 4.8.5 included with RHEL/CentOS 7.4, aren't recommended. GCC 7.3, and newer, delivers higher performance on HPL, HPCG, and DGEMM tests.
97102

98103
```bash
99104
gcc $(OPTIMIZATIONS) $(OMP) $(STACK) $(STREAM_PARAMETERS) stream.c -o stream.gcc
100105
```
106+
101107
</details>
102108

103109
## Next steps
@@ -106,4 +112,3 @@ gcc $(OPTIMIZATIONS) $(OMP) $(STACK) $(STREAM_PARAMETERS) stream.c -o stream.gcc
106112
- Review the [HBv3-series overview](hbv3-series-overview.md) and [HC-series overview](hc-series-overview.md).
107113
- Read about the latest announcements, HPC workload examples, and performance results at the [Azure Compute Tech Community Blogs](https://techcommunity.microsoft.com/t5/azure-compute/bg-p/AzureCompute).
108114
- Learn more about [HPC](/azure/architecture/topics/high-performance-computing/) on Azure.
109-

0 commit comments

Comments
 (0)