Skip to content

Commit ad8b4c4

Browse files
Merge pull request #1277 from madeline-underwood/LLVM_Machine
/cross-platform/mca-godbolt/editorial/KB_to_review
2 parents 07111e6 + 072246f commit ad8b4c4

File tree

4 files changed

+44
-39
lines changed

4 files changed

+44
-39
lines changed

content/learning-paths/cross-platform/mca-godbolt/_index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
---
2-
title: Use LLVM Machine Code Analyzer to understand code performance
2+
title: Learn about LLVM Machine Code Analyzer
33

44
minutes_to_complete: 60
55

6-
who_is_this_for: This is an introductory topic for Arm developers who want to diagnose performance issues of Arm programs using LLVM Machine Code Analyzer (MCA) and Compiler Explorer.
6+
who_is_this_for: This is an introductory topic for developers who want to diagnose performance issues of Arm programs using LLVM Machine Code Analyzer (MCA) and Compiler Explorer.
77

88
learning_objectives:
99
- Estimate the hardware resource pressure and the number of cycles taken to execute your code snippet using llvm-mca.
10-
- Understand how this estimate can help diagnose possible performance issues.
10+
- Describe how this estimate can help diagnose possible performance issues.
1111
- Use Compiler Explorer to run llvm-mca.
1212

1313
prerequisites:
1414
- Familiarity with Arm assembly.
15-
- LLVM version 16 or newer (to include Neoverse V2 support).
15+
- LLVM version 16 or newer, which includes support for Neoverse V2.
1616

1717
author_primary: Rin Dobrescu
1818

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
title: Background
3+
weight: 2
4+
### FIXED, DO NOT MODIFY
5+
layout: learningpathall
6+
---
7+
8+
### Terminology
9+
10+
Before you get started, familiarize yourself with the terms below:
11+
12+
- **Instruction scheduling**: If two instructions appear in a sequence in a program, but are independent from each other, the compiler can swap them without affecting the program's behavior. The goal of instruction scheduling is to find a valid permutation of the program instructions that also optimizes the program's performance, by making use of processor resources.
13+
14+
- **Pipeline**: A pipeline is the mechanism used by the processor to execute instructions. Pipelining makes efficient use of processor resources by dividing instructions into stages that can overlap and be processed in parallel, reducing the time it takes for instructions to execute. Instructions can only be executed if the required data is available, otherwise this leads to a delay in execution called a pipeline stall.
15+
16+
- **Resource pressure**: Resources refer to the hardware units used to execute instructions. If instructions in a program all rely on the same resources, then it leads to pressure. Execution is slowed down as instructions must wait until the unit they need becomes available.
17+
18+
- **Data dependency**: Data dependency refers to the relationship between instructions. When an instruction requires data from a previous instruction this creates a data dependency.
19+
20+
21+
### What is Machine Code Analyzer (MCA)?
22+
23+
Machine Code Analyzer (MCA) is a performance analysis tool that uses information available in [LLVM](https://github.com/llvm/llvm-project) to measure performance on a specific CPU.
24+
25+
26+
### How can MCA be useful?
27+
28+
MCA takes as input a snippet of assembly code and then simulates the execution of that code in a loop of iterations, and the default is 100.
29+
30+
MCA then outputs a performance report, which contains information such as the latency and throughput of the assembly block and the resource usage for each instruction.
31+
32+
Using this information, you can identify bottlenecks in performance such as resource pressure and data dependencies. There are many options you can give MCA to get performance metrics. The options are explained in the [llvm-mca documentation](https://llvm.org/docs/CommandGuide/llvm-mca.html).
33+

content/learning-paths/cross-platform/mca-godbolt/mca_on_godbolt.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
---
22
title: Use MCA with Compiler Explorer
3-
weight: 3
3+
weight: 4
44
### FIXED, DO NOT MODIFY
55
layout: learningpathall
66
---
77

88
### What is Compiler Explorer?
99

10-
Compiler Explorer is an interactive online compiler that lets you enter code in C/C++, Java, Python and many other programming languages. It allows you to see what the code looks like after being compiled in real time.
10+
Compiler Explorer is an interactive online compiler that is compatible with code in C/C++, Java, Python, and many other programming languages. It allows you to see what the code looks like after being compiled in real time.
1111

1212
Compiler Explorer supports multiple compilers and has many tools available, including `llvm-mca`.
1313

1414
### Running MCA in Compiler Explorer
1515

1616
To access Compiler Explorer, open a browser and go to https://godbolt.org.
1717

18-
This leads you to the page shown below in Figure 1. Your view may be a slightly different.
18+
This leads you to the page shown below in Figure 1. Your view might be slightly different.
1919

2020
![godbolt open alt-text#center](open.png "Figure 1. Compiler Explorer")
2121

@@ -44,7 +44,7 @@ Next, update the compiler flags by typing `-O3` in the `Compiler options` box.
4444
4545
You can view the full set of options passed to the compiler by clicking on the green tick next to the compiler.
4646
47-
Click the `Add tool` dropdown button to add `llvm-mca` as a tool as shown in Figure 2 below:
47+
Click the `Add tool` drop-down button to add `llvm-mca` as a tool as shown in Figure 2 below:
4848
4949
![tool mca alt-text#center](tool-mca.png "Figure 2. Assembly in Compiler Explorer")
5050

content/learning-paths/cross-platform/mca-godbolt/running_mca.md

Lines changed: 3 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,37 +4,9 @@ weight: 2
44
### FIXED, DO NOT MODIFY
55
layout: learningpathall
66
---
7-
8-
### Terminology
9-
10-
Before you get started, familiarize yourself with the terms below:
11-
12-
- **Instruction scheduling**: If two instructions appear in a sequence in a program, but are independent from each other, the compiler can swap them without affecting the program's behavior. The goal of instruction scheduling is to find a valid permutation of the program instructions that also optimizes the program's performance, by making use of processor resources.
13-
14-
- **Pipeline**: A pipeline is the mechanism used by the processor to execute instructions. Pipelining makes efficient use of processor resources by dividing instructions into stages that can overlap and be processed in parallel, reducing the time it takes for instructions to execute. Instructions can only be executed if the required data is available, otherwise this leads to a delay in execution called a pipeline stall.
15-
16-
- **Resource pressure**: Resources refer to the hardware units used to execute instructions. If instructions in a program all rely on the same resources, then it leads to pressure. Execution is slowed down as instructions must wait until the unit they need becomes available.
17-
18-
- **Data dependency**: Data dependency refers to the relationship between instructions. When an instruction requires data from a previous instruction this creates a data dependency.
19-
20-
21-
### What is Machine Code Analyzer (MCA)?
22-
23-
Machine Code Analyzer (MCA) is a performance analysis tool that uses information available in [LLVM](https://github.com/llvm/llvm-project) to measure performance on a specific CPU.
24-
25-
26-
### How can MCA be useful?
27-
28-
MCA takes as input a snippet of assembly code and then simulates the execution of that code in a loop of iterations (default is 100).
29-
30-
MCA then outputs a performance report, which contains information such as the latency and throughput of the assembly block and the resource usage for each instruction.
31-
32-
Using this information, you can identify bottlenecks in performance such as resource pressure and data dependencies. There are many options you can give MCA to get performance metrics. The options are explained in the [llvm-mca documentation](https://llvm.org/docs/CommandGuide/llvm-mca.html).
33-
34-
357
### MCA example with Arm assembly
368

37-
You have learned what MCA is and what kind of information it can provide. Now you are going to use MCA to identify a performance issue and improve a snippet of Arm assembly.
9+
You have learned what MCA is and what kind of information it provides. Now you are going to use MCA to identify a performance issue and improve a snippet of Arm assembly.
3810

3911
The example below demonstrates how to run `llvm-mca`, what the expected output is, and the conclusions you can draw using the performance metrics MCA provides.
4012

@@ -121,11 +93,11 @@ Resource pressure by instruction:
12193

12294
The MCA output shows a lot of information. The most relevant parts are covered below. For further details, you can look at the [llvm-mca documentation](https://llvm.org/docs/CommandGuide/llvm-mca.html#how-llvm-mca-works).
12395

124-
The first part of the output, up to the `Instruction Info` section, is general information about the loop and the hardware. MCA simulated the execution of the code in a loop for 100 iterations. It executed a total of 500 instructions in 503 cycles. If you calculate the instructions per cycle (IPC) on average you get 500/503≈0.99 IPC. The dispatch width of 16 means the CPU is capable of dispatching 16 instructions per cycle.
96+
The first part of the output, up to the `Instruction Info` section, is general information about the loop and the hardware. MCA simulated the execution of the code in a loop for 100 iterations. It executed a total of 500 instructions in 503 cycles. If you calculate the instructions per cycle (IPC), on average you get 500/503≈0.99 IPC. The dispatch width of 16 means the CPU is capable of dispatching 16 instructions per cycle.
12597

12698
The second part of the output, up to the `Resources` section, gives information about each individual instruction. Latency represents how many cycles each instruction takes to execute. Throughput represents the rate at which instructions are executed per cycle. Reciprocal throughput (RThroughput) is the inverse of throughput (1/throughput) and represents cycles per instruction.
12799

128-
An important part of this output is the `Resource pressure by instruction` section. It shows which instructions are executed on which pipelines. You can see that the add instructions use resources `[4]-[9]` and that pressure is equally spread through the available resources.
100+
An important part of this output is the `Resource pressure by instruction` section. It shows the instructions that are are executed on each pipeline. You can see that the add instructions use resources `[4]-[9]` and that pressure is equally spread through the available resources.
129101

130102
The [Arm Neoverse V2 Software Optimization Guide](https://developer.arm.com/documentation/109898/latest/) shows which pipelines are used by which instructions.
131103

0 commit comments

Comments
 (0)