You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/multiplying-matrices-with-sme2/1-get-started.md
+6-7Lines changed: 6 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,12 @@ layout: learningpathall
8
8
9
9
## Choose your SME2 setup: native or emulated
10
10
11
-
Before you can build or run any SME2-accelerated code, you need to set up your development environment.
12
-
13
-
This section walks you through the required tools and the two supported execution options, which are:
11
+
To build or run SME2-accelerated code, first set up your development environment.
12
+
This section walks you through the required tools and two supported setup options:
14
13
15
14
*[**Native SME2 hardware**](#set-up-a-system-with-native-SME2-support) - build and run directly on a system with SME2 support. For supported devices, see [Devices with SME2 support](#devices-with-sme2-support).
16
15
17
-
*[**Docker-based emulation**](#set-up-a-system-using-sme2-emulation-with-dockerset-up-a-system-using-SME2-emulation-with-Docker) - use a container to emulate SME2 in bare metal mode (without an OS).
16
+
*[**Docker-based emulation**](#set-up-a-system-using-sme2-emulation-with-docker) - use a container to emulate SME2 in bare metal mode (without an OS).
18
17
19
18
## Download and explore the code examples
20
19
@@ -66,7 +65,7 @@ Amongst other files, it includes:
66
65
- A `docker` directory containing:
67
66
-`assets.source_me` to provide toolchain paths.
68
67
-`build-my-container.sh`, a script that automates building the Docker image from the `sme2-environment.docker` file. It runs the Docker build command with the correct arguments so you don’t have to remember them.
69
-
-`sme2-environment.docker`, a Docker file that defines the steps to build the SME2 container image. It installs all the necessary dependencies, including the SME2-compatible compiler and Arm FVP emulator.
68
+
-`sme2-environment.docker`, a custom Docker file that defines the steps to build the SME2 container image. It installs all the necessary dependencies, including the SME2-compatible compiler and Arm FVP emulator.
70
69
-`build-all-containers.sh`, a script to build multi-architecture images.
71
70
-`.devcontainer/devcontainer.json` for VS Code container support.
72
71
@@ -234,7 +233,7 @@ If you are using Visual Studio Code as your IDE, the container setup is already
234
233
235
234
Make sure you have the [Microsoft Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension installed.
236
235
237
-
Then select the **Reopen in Container** menu entry as Figure 1 shows.
236
+
Then select the **Reopen in Container** menu entry as shown below.
238
237
239
238
It automatically finds and uses ``.devcontainer/devcontainer.json``:
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/multiplying-matrices-with-sme2/2-check-your-environment.md
+18-12Lines changed: 18 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,13 +6,11 @@ weight: 4
6
6
layout: learningpathall
7
7
---
8
8
9
-
In this section, you will verify that your environment is set up and ready to
10
-
develop with SME2. This will be your first hands-on experience with the
11
-
environment.
9
+
In this section, you'll verify that your environment is ready for SME2 development. This is your first hands-on task and confirms that the toolchain, hardware (or emulator), and compiler are set up correctly.
12
10
13
-
## Compile the examples
11
+
## Build the code examples
14
12
15
-
First, build the code examples by running `make`:
13
+
Use the `make` command to compile all examples and generate assembly listings:
@@ -66,6 +64,8 @@ The `make` command performs the following tasks:
66
64
- It creates the assembly listings for the four executables: `hello.lst`,
67
65
`sme2_check.lst`, `sme2_matmul_asm.lst`, and `sme2_matmul_intr.lst`.
68
66
67
+
These targets compile and link all example programs and generate disassembly listings for inspection.
68
+
69
69
At any point, you can clean the directory of all the files that have been built
70
70
by invoking `make clean`:
71
71
@@ -114,12 +114,20 @@ Run the `hello` program with:
114
114
{{< /tab >}}
115
115
{{< /tabpane >}}
116
116
117
-
In the emulated case, you may see that the FVP prints out extra lines. The key confirmation is the presence of "Hello, world!" in the output. it demonstrates that the generic code can be compiled and executed.
117
+
In the emulated case, you may see that the FVP prints out extra lines. The key confirmation is the presence of "Hello, world!" in the output. It demonstrates that the generic code can be compiled and executed.
118
118
119
119
## Check SME2 availability
120
120
121
121
You will now run the `sme2_check` program, which verifies that SME2 works as expected. This checks both the compiler and the CPU (or the emulated CPU) are properly supporting SME2.
122
122
123
+
The `sme2_check` program verifies that SME2 is available and working. It confirms:
124
+
125
+
* The compiler supports SME2 (via __ARM_FEATURE_SME2)
126
+
127
+
* The system or emulator reports SME2 capability
128
+
129
+
* Streaming mode works as expected
130
+
123
131
The source code is found in `sme2_check.c`:
124
132
125
133
```C { line_numbers="true" }
@@ -191,10 +199,7 @@ The ``sme2_check`` program then displays whether SVE, SME and SME2 are supported
191
199
at line 24. The checking of SVE, SME and SME2 is done differently depending on
192
200
``BAREMETAL``. This platform specific behaviour is abstracted by the
193
201
``display_cpu_features()``:
194
-
- In baremetal mode, our program has access to system registers and can thus do
195
-
some low level peek at what the silicon actually supports. The program will
196
-
print the SVE field of the ``ID_AA64PFR0_EL1`` system register and the SME
197
-
field of the ``ID_AA64PFR1_EL1`` system register.
202
+
- In baremetal mode, our program has access to system registers and can inspect system registers for SME2 support. The program will print the SVE field of the ``ID_AA64PFR0_EL1`` system register and the SME field of the ``ID_AA64PFR1_EL1`` system register.
198
203
- In non baremetal mode, on an Apple platform the program needs to use a higher
199
204
level API call.
200
205
@@ -213,6 +218,8 @@ annotated with the ``__arm_locally_streaming`` attribute, which instructs the
213
218
compiler to automatically switch to streaming mode when invoking this function.
214
219
Streaming mode will be discussed in more depth in the next section.
215
220
221
+
Look for the following confirmation messages in the output:
@@ -243,5 +250,4 @@ Streaming mode will be discussed in more depth in the next section.
243
250
{{< /tab >}}
244
251
{{< /tabpane >}}
245
252
246
-
You have now checked that the code can be compiled and run with full SME2
247
-
support. You are all set to move to the next section.
253
+
You've now confirmed that your environment can compile and run SME2 code, and that SME2 features like streaming mode are working correctly. You're ready to continue to the next section and start working with SME2 in practice.
In large-scale software, programs often switch between streaming and non-streaming mode. Some streaming-mode functions may call others, requiring portions of processor state, such as the ZA storage, to be saved and restored. This behavior is defined in the Arm C Language Extensions (ACLE) and is supported by the compiler.
11
+
Programs can switch between streaming and non-streaming mode during execution. When one streaming-mode function calls another, parts of the processor state - such as ZA storage - might need to be saved and restored. This behavior is governed by the Arm C Language Extensions (ACLE) and is managed by the compiler.
12
12
13
13
To use streaming mode, you simply annotate the relevant functions with the appropriate keywords. The compiler handles the low-level mechanics of streaming mode management, removing the need for error-prone, manual work.
14
14
@@ -18,29 +18,28 @@ For more information, see the [Introduction to streaming and non-streaming mode]
18
18
19
19
## Streaming mode behavior and compiler handling
20
20
21
+
Streaming mode changes how the processor and compiler manage execution context. Here's how it works:
22
+
21
23
* The AArch64 architecture defines a concept called *streaming mode*, controlled
22
24
by a processor state bit `PSTATE.SM`.
23
25
24
-
* At any given point in time, the processor is either in streaming mode (`PSTATE.SM==1`) or in non-streaming mode (`PSTATE.SM==0`).
26
+
* At any given point in time, the processor is either in streaming mode (`PSTATE.SM == 1`) or in non-streaming mode (`PSTATE.SM == 0`).
25
27
26
28
* To enter streaming mode, there is the instruction `SMSTART`, and to return to non-streaming mode, the instruction is `SMSTOP`.
27
29
28
30
* Streaming mode affects C and C++ code in the following ways:
29
31
30
32
- It can change the length of SVE vectors and predicates. The length of an SVE vector in streaming mode is called the *Streaming Vector Length* (SVL), which might differ from the non-streaming vector length. See [Effect of streaming mode on VL](https://arm-software.github.io/acle/main/acle.html#effect-of-streaming-mode-on-vl) for further information.
31
-
- Some instructions, and their associated ACLE intrinsics, can only be executed in streaming mode.These intrinsics are called *streaming intrinsics*.
32
-
- Other instructions are restricted to non-streaming mode, and their instrinsics are called *non-streaming intrinsics*.
33
+
- Some instructions, and their associated ACLE intrinsics, can only be executed in streaming mode.These are called *streaming intrinsics*.
34
+
- Other instructions are restricted to non-streaming mode. These are called *non-streaming intrinsics*.
33
35
34
36
The ACLE specification extends the C and C++ abstract machine model to include streaming mode. At any given time, the abstract machine is either in streaming or non-streaming mode.
35
37
36
38
This distinction between abstract machine mode and processor mode is mostly a specification detail. At runtime, the processor’s mode may differ from the abstract machine’s mode - as long as the observable program behavior remains consistent (as per the "as-if" rule).
37
39
38
-
One
39
-
practical consequence of this is that C and C++ code does not specify the exact
40
-
placement of `SMSTART` and `SMSTOP` instructions; the source code simply places
41
-
limits on where such instructions go. For example, when stepping through a
42
-
program in a debugger, the processor mode might sometimes be different from the
43
-
one implied by the source code.
40
+
{{% notice Note %}}
41
+
One practical consequence of this is that C and C++ code does not specify the exact placement of `SMSTART` and `SMSTOP` instructions; the source code simply places limits on where such instructions go. For example, when stepping through a program in a debugger, the processor mode might sometimes be different from the one implied by the source code.
42
+
{{% /notice %}}
44
43
45
44
ACLE provides attributes that specify whether the abstract machine executes statements:
46
45
@@ -56,10 +55,11 @@ is enabled.
56
55
57
56
In C and C++, ZA usage is specified at the function level: a function either uses ZA or it doesn't. That is, a function either has ZA state or it does not.
58
57
59
-
If a function does have ZA state, the function can either share that ZA state
60
-
with the function's caller or create new ZA state. In the latter
61
-
case, it is the compiler's responsibility to free up ZA so that the function can
62
-
use it - see the description of the lazy saving scheme in
63
-
[AAPCS64](https://arm-software.github.io/acle/main/acle.html#AAPCS64) for details
64
-
about how the compiler does this.
58
+
Functions that use ZA can either:
59
+
60
+
- Share the caller’s ZA state
61
+
- Allocate a new ZA state for themselves
62
+
63
+
When new state is needed, the compiler is responsible for preserving the caller’s state using a *lazy saving* scheme. For more information, see the [AAPCS64 section of the ACLE spec](https://arm-software.github.io/acle/main/acle.html#AAPCS64).
Copy file name to clipboardExpand all lines: content/learning-paths/cross-platform/multiplying-matrices-with-sme2/4-vanilla-matmul.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ layout: learningpathall
8
8
9
9
## Overview
10
10
11
-
In this section, you'll implement a basic matrix multiplication algorithm in C, using a row-major memory layout. This version serves as a reference implementation for validating optimized versions later in the Learning Path.
11
+
In this section, you'll implement a basic matrix multiplication algorithm in C using row-major memory layout. This version acts as a reference implementation that you'll use to validate the correctness of optimized versions later in the Learning Path.
12
12
13
13
## Vanilla matrix multiplication algorithm
14
14
@@ -21,6 +21,8 @@ It produces an output matrix C [`Cr` rows x `Cc` columns].
21
21
22
22
The algorithm works by iterating over each row of A and each column of B. It multiplies the corresponding elements and sums the products to generate each element of matrix C, as shown in the figure below.
23
23
24
+
The diagram below shows how matrix C is computed by iterating over rows of A and columns of B:
25
+
24
26

25
27
26
28
This implies that the A, B, and C matrices have some constraints on their
@@ -34,16 +36,14 @@ properties and use, see this [Wikipedia article on Matrix Multiplication](https:
34
36
35
37
## Variable mappings in this Learning Path
36
38
37
-
In this Learning Path, you'll use the following variable names:
39
+
The following variable names are used throughout the Learning Path to represent matrix dimensions and operands:
38
40
39
-
-`matLeft` corresponds to the left-hand side argument of the matrix
40
-
multiplication.
41
+
-`matLeft` corresponds to the left-hand side argument of the matrix multiplication.
41
42
-`matRight`corresponds to the right-hand side of the matrix multiplication.
42
43
-`M` is `matLeft` number of rows.
43
44
-`K` is `matLeft` number of columns (and `matRight` number of rows).
44
45
-`N` is `matRight` number of columns.
45
-
-`matResult`corresponds to the result of the matrix multiplication, with
46
-
`M` rows and `N` columns.
46
+
-`matResult`corresponds to the result of the matrix multiplication, with `M` rows and `N` columns.
0 commit comments