Skip to content

Commit f14903e

Browse files
committed
Learn Editor: Update compiling-scaling-applications.md
1 parent 8f18129 commit f14903e

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/virtual-machines/compiling-scaling-applications.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,17 @@ Optimal scale-up and scale-out performance of HPC applications on Azure requires
1818

1919
## Application setup
2020
The [azurehpc repo](https://github.com/Azure/azurehpc) contains many examples of:
21-
- Setting up and running [applications](https://github.com/Azure/azurehpc/tree/master/apps) optimally.
21+
- - Setting up and running [applications](https://github.com/Azure/azurehpc/tree/master/apps) optimally.
2222
- Configuration of [file systems, and clusters](https://github.com/Azure/azurehpc/tree/master/examples).
2323
- - [Tutorials](https://github.com/Azure/azurehpc/tree/master/tutorials) on how to get started easily with some common application workflows.
2424

2525
## Optimally scaling MPI
2626

2727
The following suggestions apply for optimal application scaling efficiency, performance, and consistency:
2828

29-
- For smaller scale jobs (< 256 K connections) use:
29+
- For smaller scale jobs (< 256K connections) use:
3030
```bash UCX_TLS=rc,sm ```
31-
- For larger scale jobs (> 256 K connections) use:
31+
- For larger scale jobs (> 256K connections) use:
3232
```bash UCX_TLS=dc,sm ```
3333
- To calculate the number of connections for your MPI job, use:
3434
```bash Max Connections = (processes per node) x (number of nodes per job) x (number of nodes per job) ```
@@ -40,8 +40,8 @@ Adaptive Routing (AR) allows Azure Virtual Machines (VMs) running EDR and HDR In
4040

4141
- Pin processes to cores using a sequential pinning approach (as opposed to an autobalance approach).
4242
- Binding by Numa/Core/HwThread is better than default binding.
43-
- For hybrid parallel applications (OpenMP+MPI), use 4 threads and 1 MPI rank per [CCX]([HB-series virtual machines overview including info on CCXs](/azure/virtual-machines/hb-series-overview)) on HB and HBv2 VM sizes.
44-
- For pure MPI applications, experiment with 1-4 MPI ranks per CCX for optimal performance on HB and HBv2 VM sizes.
43+
- For hybrid parallel applications (OpenMP+MPI), use four threads and one MPI rank per [CCX]([HB-series virtual machines overview including info on CCXs](/azure/virtual-machines/hb-series-overview)) on HB and HBv2 VM sizes.
44+
- For pure MPI applications, experiment with between one to four MPI ranks per CCX for optimal performance on HB and HBv2 VM sizes.
4545
- Some applications with extreme sensitivity to memory bandwidth may benefit from using a reduced number of cores per CCX. For these applications, using three or two cores per CCX may reduce memory bandwidth contention and yield higher real-world performance or more consistent scalability. In particular, MPI 'Allreduce' may benefit from this approach.
4646
- For larger scale runs, it's recommended to use UD or hybrid RC+UD transports. Many MPI libraries/runtime libraries use these transports internally (such as UCX or MVAPICH2). Check your transport configurations for large-scale runs.
4747

0 commit comments

Comments
 (0)