You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/mca-godbolt/background.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,9 +25,12 @@ Machine Code Analyzer (MCA) is a performance analysis tool that uses information
25
25
26
26
### How can MCA be useful?
27
27
28
-
MCA takes as input a snippet of assembly code and then simulates the execution of that code in a loop of iterations, and the default is 100.
28
+
MCA takes as input a snippet of assembly code and then simulates the execution of that code in a loop of iterations, and the default is 100.
29
29
30
-
MCA then outputs a performance report, which contains information such as the latency and throughput of the assembly block and the resource usage for each instruction.
30
+
MCA then outputs a performance report, which contains information such as the latency and throughput of the assembly block and the resource usage for each instruction.
31
31
32
32
Using this information, you can identify bottlenecks in performance such as resource pressure and data dependencies. There are many options you can give MCA to get performance metrics. The options are explained in the [llvm-mca documentation](https://llvm.org/docs/CommandGuide/llvm-mca.html).
33
33
34
+
### How to acquire MCA
35
+
36
+
MCA is available as part of most Linux distributions, however the version tends to lag behind the current LLVM release. A recent version of MCA is also shipped as part of the Arm Toolchain for Linux (ATfL). You can find more information about ATfL and installation steps in the [ATfL user guide](https://developer.arm.com/documentation/110477/211/?lang=en). The set of cores available for performance estimation in MCA is determined by the LLVM version. You can check the version you are currently using by running `llvm-mca --version`. Using a recent version of LLVM is recommended in order to take advantage of improvements made to MCA. The most recent release can be obtained directly from LLVM, by downloading one of their [release packages](https://github.com/llvm/llvm-project/releases/). LLVM also makes nightly builds available to [download for Debian/Ubuntu](https://apt.llvm.org) based systems.
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/mca-godbolt/mca_on_godbolt.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,19 +7,19 @@ layout: learningpathall
7
7
8
8
### What is Compiler Explorer?
9
9
10
-
Compiler Explorer is an interactive online compiler that is compatible with code in C/C++, Java, Python, and many other programming languages. It allows you to see what the code looks like after being compiled in real time.
10
+
Compiler Explorer is an interactive online compiler that is compatible with code in C/C++, Java, Python, and many other programming languages. It allows you to see what the code looks like after being compiled in real time. This is helpful when you want to try different compiler versions without installing them.
11
11
12
12
Compiler Explorer supports multiple compilers and has many tools available, including `llvm-mca`.
13
13
14
14
### Running MCA in Compiler Explorer
15
15
16
-
To access Compiler Explorer, open a browser and go to https://godbolt.org.
16
+
To access Compiler Explorer, open a browser and go to https://godbolt.org.
17
17
18
-
This leads you to the page shown below in Figure 1. Your view might be slightly different.
18
+
This leads you to the page shown below in Figure 1. Your view might be slightly different.
19
19
20
-

20
+

21
21
22
-
The left side of the page contains the source code. In Figure 1, the language is set to C++, but you can click on the programming language to select a different language for the source code.
22
+
The left side of the page contains the source code. In Figure 1, the language is set to C++, but you can click on the programming language to select a different language for the source code.
23
23
24
24
Copy the code below and paste it into Compiler Explorer as C++ source code:
25
25
@@ -34,28 +34,28 @@ int func(int a, int b, int c, int d, int e, int f) {
34
34
}
35
35
```
36
36
37
-
The right side of the page contains the disassembly output from the compiler.
37
+
The right side of the page contains the disassembly output from the compiler.
38
38
39
-
You can change the compiler by clicking on it and selecting a different one.
39
+
You can change the compiler by clicking on it and selecting a different one.
40
40
41
-
Select `armv8-a clang(trunk)` as the compiler to see Arm instructions.
41
+
Select `armv8-a clang(trunk)` as the compiler to see Arm instructions.
42
42
43
-
Next, update the compiler flags by typing `-O3` in the `Compiler options` box.
43
+
Next, update the compiler flags by typing `-O3` in the `Compiler options` box.
44
44
45
-
You can view the full set of options passed to the compiler by clicking on the green tick next to the compiler.
45
+
You can view the full set of options passed to the compiler by clicking on the green tick next to the compiler.
46
46
47
47
Click the `Add tool` drop-down button to add `llvm-mca` as a tool as shown in Figure 2 below:
48
48
49
49

50
50
51
-
To add more flags to `llvm-mca`, click on the `Arguments` button and type them in.
51
+
To add more flags to `llvm-mca`, click on the `Arguments` button and type them in.
52
52
53
-
Add `-mcpu=neoverse-v2`, as well as any other flags you choose to pass to `llvm-mca`.
53
+
Add `-mcpu=neoverse-v2`, as well as any other flags you choose to pass to `llvm-mca`.
54
54
55
-
To find what CPUs are supported you can check the [clang documentation](https://clang.llvm.org/docs/CommandGuide/clang.html#cmdoption-print-supported-cpus).
55
+
To find what CPUs are supported you can check the [clang documentation](https://clang.llvm.org/docs/CommandGuide/clang.html#cmdoption-print-supported-cpus).
56
56
57
57
The right side of the page now contains the output from running `llvm-mca` on the disassembly of the source code, as shown in Figure 3 below:
58
58
59
59

60
60
61
-
You are now able to run `llvm-mca` using Compiler Explorer. This is helpful when you want to try different compiler versions without installing them.
61
+
You are now able to run `llvm-mca` using Compiler Explorer.
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/mca-godbolt/running_mca.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,11 @@ layout: learningpathall
6
6
---
7
7
### MCA example with Arm assembly
8
8
9
-
You have learned what MCA is and what kind of information it provides. Now you are going to use MCA to identify a performance issue and improve a snippet of Arm assembly.
9
+
You have learned what MCA is and what kind of information it provides. Now you are going to use MCA to identify a performance issue and improve a snippet of Arm assembly.
10
10
11
11
The example below demonstrates how to run `llvm-mca`, what the expected output is, and the conclusions you can draw using the performance metrics MCA provides.
12
12
13
-
The example below computes the sum of 6 numbers.
13
+
The example below computes the sum of 6 numbers.
14
14
15
15
Use a text editor to save the program below in a file named `sum_test1.s`:
16
16
@@ -389,8 +389,10 @@ Average Wait times (based on the timeline view):
389
389
10 3.6 1.9 0.7 <total>
390
390
```
391
391
392
-
You can see by looking at the timeline view that instructions no longer depend on each other and can execute in parallel.
392
+
You can see by looking at the timeline view that instructions no longer depend on each other and can execute in parallel.
393
393
394
-
Instructions also spend less time waiting in the scheduler's queue. This explains why the performance of `sum_test2.s` is so much better than `sum_test1.s`.
394
+
Instructions also spend less time waiting in the scheduler's queue. This explains why the performance of `sum_test2.s` is so much better than `sum_test1.s`.
395
+
396
+
Note the use of the flag `-mcpu=neoverse-v2` throughout all of those examples. This flag tells MCA to simulate the performance of the code in `sum_test1.s` and `sum_test2.s` on a Neoverse V2 core. This flag can be changed to any core supported in MCA. You can find what cores are supported in MCA by running `llvm-mca -mcpu=help <<<''`. You can also look at the LLVM sources in [llvm-project](https://github.com/llvm/llvm-project/tree/main/llvm/test/tools/llvm-mca/AArch64), which will give you more detailed examples. For instance, when looking at the Neoverse cores, there is currently support for the N1, N2, N3 and the V1, V2, V3 cores.
395
397
396
398
In the next section, you can try running `llvm-mca` with Compiler Explorer.
0 commit comments