You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/virtual-machines/compiling-scaling-applications.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,17 +18,17 @@ Optimal scale-up and scale-out performance of HPC applications on Azure requires
18
18
19
19
## Application setup
20
20
The [azurehpc repo](https://github.com/Azure/azurehpc) contains many examples of:
21
-
- Setting up and running [applications](https://github.com/Azure/azurehpc/tree/master/apps) optimally.
21
+
--Setting up and running [applications](https://github.com/Azure/azurehpc/tree/master/apps) optimally.
22
22
- Configuration of [file systems, and clusters](https://github.com/Azure/azurehpc/tree/master/examples).
23
23
--[Tutorials](https://github.com/Azure/azurehpc/tree/master/tutorials) on how to get started easily with some common application workflows.
24
24
25
25
## Optimally scaling MPI
26
26
27
27
The following suggestions apply for optimal application scaling efficiency, performance, and consistency:
28
28
29
-
- For smaller scale jobs (< 256 K connections) use:
29
+
- For smaller scale jobs (< 256K connections) use:
30
30
```bash UCX_TLS=rc,sm ```
31
-
- For larger scale jobs (> 256 K connections) use:
31
+
- For larger scale jobs (> 256K connections) use:
32
32
```bash UCX_TLS=dc,sm ```
33
33
- To calculate the number of connections for your MPI job, use:
34
34
```bash Max Connections = (processes per node) x (number of nodes per job) x (number of nodes per job) ```
@@ -40,8 +40,8 @@ Adaptive Routing (AR) allows Azure Virtual Machines (VMs) running EDR and HDR In
40
40
41
41
- Pin processes to cores using a sequential pinning approach (as opposed to an autobalance approach).
42
42
- Binding by Numa/Core/HwThread is better than default binding.
43
-
- For hybrid parallel applications (OpenMP+MPI), use 4 threads and 1 MPI rank per [CCX]([HB-series virtual machines overview including info on CCXs](/azure/virtual-machines/hb-series-overview)) on HB and HBv2 VM sizes.
44
-
- For pure MPI applications, experiment with 1-4 MPI ranks per CCX for optimal performance on HB and HBv2 VM sizes.
43
+
- For hybrid parallel applications (OpenMP+MPI), use four threads and one MPI rank per [CCX]([HB-series virtual machines overview including info on CCXs](/azure/virtual-machines/hb-series-overview)) on HB and HBv2 VM sizes.
44
+
- For pure MPI applications, experiment with between one to four MPI ranks per CCX for optimal performance on HB and HBv2 VM sizes.
45
45
- Some applications with extreme sensitivity to memory bandwidth may benefit from using a reduced number of cores per CCX. For these applications, using three or two cores per CCX may reduce memory bandwidth contention and yield higher real-world performance or more consistent scalability. In particular, MPI 'Allreduce' may benefit from this approach.
46
46
- For larger scale runs, it's recommended to use UD or hybrid RC+UD transports. Many MPI libraries/runtime libraries use these transports internally (such as UCX or MVAPICH2). Check your transport configurations for large-scale runs.
0 commit comments