Skip to content

Commit 610e3e2

Browse files
authored
Create optimizing-performance.md
1 parent 5a3d8b3 commit 610e3e2

File tree

1 file changed

+113
-0
lines changed

1 file changed

+113
-0
lines changed
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
title: "Optimizing Performance for Azure HPC and AI Virtual Machines"
3+
description: Learn about understanding and measuring the performance concepts and benchmarking methologies.
4+
author: padmalathas
5+
ms.author: padmalathas
6+
ms.date: 03/25/2025
7+
ms.topic: conceptual
8+
ms.service: azure-virtual-machines
9+
ms.subservice: hpc
10+
---
11+
12+
# Optimizing Performance for Azure HPC and AI Virtual Machines
13+
14+
In the realm of high-performance computing (HPC) and artificial intelligence (AI), optimizing the performance of virtual machines (VMs) is crucial. Azure offers a suite of tools and techniques to ensure that HPC and AI workloads run efficiently on its platform. Two key aspects of this optimization are the pinning of processes and threads, and the optimal placement of MPI processes.
15+
16+
This article provides a detailed guidance on how to enhance the performance of HPC and AI workloads on Azure VMs. It aims on the importance of process and thread pinning, optimal placement of MPI processes, and the use of Azure's tools like checkapppinning.py to achieve these optimizations. Also, it covers strategies for MPI process placement, performance metrics collection, and recommendations for different MPI implementations to ensure efficient and effective execution of HPC and AI applications on Azure's HPC specialty VMs.
17+
18+
## Tool to Assist in Optimal Pinning of Processes/Threads for Azure HPC/AI VMs
19+
20+
To maximize the performance of HPC applications, it is essential to distribute processes and threads evenly across the VM, utilizing all sockets, NUMA domains, and L3 caches. This distribution ensures that memory bandwidth and floating-point performance are optimized. In hybrid parallel applications, each process has several threads associated with it, and it is recommended to have a process and its threads on the same L3 cache to maximize data sharing and reuse.
21+
22+
Azure provides a tool called [**Check App Pinning**](https://github.com/Azure/azurehpc/tree/master/experimental/check_app_pinning_tool) to assist in this process. This tool helps users view the VM CPU topology, check where parallel application processes and threads are running, and generate optimal MPI and Slurm scheduler process affinity arguments. By using this tool, users can ensure that their HPC/AI applications are running in an optimal manner on Azure HPC specialty VMs.
23+
24+
Example: Using the tool
25+
- View VM CPU topology
26+
```python
27+
# python check_app_pinning.py --view-topology
28+
```
29+
- Check process and thread placement
30+
```python
31+
# python check_app_pinning.py --check-placement
32+
```
33+
- Generate affinity arguments
34+
```python
35+
# python check_app_pinning.py --generate-affinity
36+
```
37+
By leveraging this tool, you can achieve better performance for the HPC and AI workloads on Azure, ensuring that the applications run efficiently and effectively.
38+
39+
### Optimal MPI process placement for Azure HB series VMs
40+
41+
For MPI applications, optimal pinning of processes can lead to significant performance improvements, especially for undersubscribed systems. The introduction of AMDs Chiplet design has added complexity to this process. In the Chiplet design, AMD integrates smaller CPUs together to provide a socket with 64 cores. To maximize performance, it is important to balance the amount of L3 cache and memory bandwidth per core.
42+
43+
Azure HB series VMs, such as the HB60rs and HBv2, come with multiple NUMA domains and cores. For instance, the HB60rs VM has 60 AMD Naples cores, with each socket containing 8 NUMA domains. When undersubscribing the VM, users need to balance the L3 cache and memory bandwidth between cores. This can be achieved by selecting the appropriate number of cores per node and using specific MPI process placement strategies.
44+
45+
Example: MPI Process Placement
46+
- Selecting number of cores per node
47+
```bash
48+
# mpirun -np 60 --map-by ppr:8:node --bind-to core my_mpi_application
49+
```
50+
- Distribute MPI Processes evenly across NUMA domains
51+
```bash
52+
# mpirun -np 60 --map-by ppr:8:node:pe=8 --bind-to numa my_mpi_application
53+
```
54+
55+
### Performance metrics collection
56+
57+
Collecting performance metrics is essential for understanding and optimizing the performance of HPC and AI workloads. Azure provides several tools and methods for collecting these metrics.
58+
59+
Example: Collecting Performance Metrics
60+
- Using Azure Monitor:
61+
* Set up Azure Monitor to collect metrics such as CPU utilization, memory usage, and network bandwidth.
62+
* Create a Log Analytics workspace and configure diagnostic settings to send metrics to the workspace.
63+
64+
- Using PerfCollect:
65+
* Install PerfCollect on your VM
66+
```shell script
67+
# wget https://aka.ms/perfcollect -O perfcollect
68+
# chmod +x perfcollect
69+
# sudo ./perfcollect install
70+
```
71+
* Start collecting metrics
72+
```shell script
73+
# sudo ./perfcollect start mysession
74+
```
75+
* Stop collecting metrics and generate a report
76+
```shell script
77+
# sudo ./perfcollect stop mysession
78+
```
79+
### MPI Implementations
80+
81+
Different MPI implementations can have varying performance characteristics on Azure HPC/AI VMs. Common MPI implementations include OpenMPI, MPICH, and Intel MPI. Each implementation has its strengths and may perform differently based on the specific workload and VM configuration.
82+
83+
Recommendations for MPI Setup and Process Pinning
84+
- OpenMPI
85+
* Use the --bind-to and --map-by options to control process placement
86+
Example:
87+
```bash
88+
# mpirun -np 60 --bind-to core --map-by ppr:8:node my_mpi_application
89+
```
90+
- MPICH
91+
* Use the HYDRA_BIND and HYDRA_RANK environment variables to control process placement
92+
Example:
93+
```shell script
94+
# export HYDRA_BIND=core
95+
# export HYDRA_RANK=8
96+
# mpiexec -np 60 my_mpi_application
97+
```
98+
- Intel MPI
99+
* Use the I_MPI_PIN and I_MPI_PIN_DOMAIN environment variables to control process placement.
100+
Example:
101+
```shell script
102+
# export I_MPI_PIN=1
103+
# export I_MPI_PIN_DOMAIN=socket
104+
# mpirun -np 60 my_mpi_application
105+
```
106+
107+
By following these recommendations and leveraging the tools and techniques provided by Azure, users can optimize the performance of their HPC and AI workloads, ensuring efficient and effective execution on Azure's HPC specialty VMs.
108+
109+
## Resources:
110+
111+
- [Tool to assist in optimal pinning of processes/threads for Azure HPC/AI VM’s](https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/tool-to-assist-in-optimal-pinning-of-processesthreads-for-azure-hpcai-vm%e2%80%99s/2672201).
112+
- [Optimal MPI Process Placement for Azure HB Series VMs](https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/optimal-mpi-process-placement-for-azure-hb-series-vms/2450663).
113+

0 commit comments

Comments
 (0)