You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
The Arm Learning Paths website is available at https://learn.arm.com/
2
2
3
-
# Arm Learning Paths
3
+
# Arm Learning Paths
4
4
5
5
This repository contains the source files for the Arm Learning Paths static website, serving learning based technical content for Arm software developers.
who_is_this_for: This is an introductory topic for developers who want to diagnose performance issues of Arm programs using LLVM Machine Code Analyzer (MCA) and Compiler Explorer.
7
+
8
+
learning_objectives:
9
+
- Estimate the hardware resource pressure and the number of cycles taken to execute your code snippet using llvm-mca.
10
+
- Describe how this estimate can help diagnose possible performance issues.
11
+
- Use Compiler Explorer to run llvm-mca.
12
+
13
+
prerequisites:
14
+
- Familiarity with Arm assembly.
15
+
- LLVM version 16 or newer, which includes support for Neoverse V2.
- It can provide a performance estimation that can be used to understand and improve performance.
8
+
- It can provide an improved version of a given code snippet.
9
+
correct_answer: 1
10
+
explanation: >
11
+
MCA simulates the execution of a given snippet of assembly in a loop and provides performance measurements that can then be used to understand and improve performance.
12
+
13
+
- questions:
14
+
question: >
15
+
MCA can offer performance metrics for the following as input:
16
+
answers:
17
+
- A snippet of code, the language does not matter.
18
+
- A snippet of assembly code.
19
+
correct_answer: 2
20
+
explanation: >
21
+
MCA takes assembly code as input.
22
+
23
+
- questions:
24
+
question: >
25
+
When using Compiler Explorer, what does llvm-mca take as input?
26
+
answers:
27
+
- The source code provided.
28
+
- The disassembly of the source code.
29
+
correct_answer: 2
30
+
explanation: >
31
+
Compiler explorer takes as input the source code, compiles it and shows the disassembly output. It then can run llvm-mca on the disassembly of the source code.
Before you get started, familiarize yourself with the terms below:
11
+
12
+
-**Instruction scheduling**: If two instructions appear in a sequence in a program, but are independent from each other, the compiler can swap them without affecting the program's behavior. The goal of instruction scheduling is to find a valid permutation of the program instructions that also optimizes the program's performance, by making use of processor resources.
13
+
14
+
-**Pipeline**: A pipeline is the mechanism used by the processor to execute instructions. Pipelining makes efficient use of processor resources by dividing instructions into stages that can overlap and be processed in parallel, reducing the time it takes for instructions to execute. Instructions can only be executed if the required data is available, otherwise this leads to a delay in execution called a pipeline stall.
15
+
16
+
-**Resource pressure**: Resources refer to the hardware units used to execute instructions. If instructions in a program all rely on the same resources, then it leads to pressure. Execution is slowed down as instructions must wait until the unit they need becomes available.
17
+
18
+
-**Data dependency**: Data dependency refers to the relationship between instructions. When an instruction requires data from a previous instruction this creates a data dependency.
19
+
20
+
21
+
### What is Machine Code Analyzer (MCA)?
22
+
23
+
Machine Code Analyzer (MCA) is a performance analysis tool that uses information available in [LLVM](https://github.com/llvm/llvm-project) to measure performance on a specific CPU.
24
+
25
+
26
+
### How can MCA be useful?
27
+
28
+
MCA takes as input a snippet of assembly code and then simulates the execution of that code in a loop of iterations, and the default is 100.
29
+
30
+
MCA then outputs a performance report, which contains information such as the latency and throughput of the assembly block and the resource usage for each instruction.
31
+
32
+
Using this information, you can identify bottlenecks in performance such as resource pressure and data dependencies. There are many options you can give MCA to get performance metrics. The options are explained in the [llvm-mca documentation](https://llvm.org/docs/CommandGuide/llvm-mca.html).
0 commit comments