Skip to content

Commit fd7704f

Browse files
Further improvements
1 parent 332d396 commit fd7704f

File tree

4 files changed

+47
-42
lines changed

4 files changed

+47
-42
lines changed

content/learning-paths/cross-platform/multiplying-matrices-with-sme2/1-get-started.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,12 @@ layout: learningpathall
88

99
## Choose your SME2 setup: native or emulated
1010

11-
Before you can build or run any SME2-accelerated code, you need to set up your development environment.
12-
13-
This section walks you through the required tools and the two supported execution options, which are:
11+
To build or run SME2-accelerated code, first set up your development environment.
12+
This section walks you through the required tools and two supported setup options:
1413

1514
* [**Native SME2 hardware**](#set-up-a-system-with-native-SME2-support) - build and run directly on a system with SME2 support. For supported devices, see [Devices with SME2 support](#devices-with-sme2-support).
1615

17-
* [**Docker-based emulation**](#set-up-a-system-using-sme2-emulation-with-dockerset-up-a-system-using-SME2-emulation-with-Docker) - use a container to emulate SME2 in bare metal mode (without an OS).
16+
* [**Docker-based emulation**](#set-up-a-system-using-sme2-emulation-with-docker) - use a container to emulate SME2 in bare metal mode (without an OS).
1817

1918
## Download and explore the code examples
2019

@@ -66,7 +65,7 @@ Amongst other files, it includes:
6665
- A `docker` directory containing:
6766
- `assets.source_me` to provide toolchain paths.
6867
- `build-my-container.sh`, a script that automates building the Docker image from the `sme2-environment.docker` file. It runs the Docker build command with the correct arguments so you don’t have to remember them.
69-
- `sme2-environment.docker`, a Docker file that defines the steps to build the SME2 container image. It installs all the necessary dependencies, including the SME2-compatible compiler and Arm FVP emulator.
68+
- `sme2-environment.docker`, a custom Docker file that defines the steps to build the SME2 container image. It installs all the necessary dependencies, including the SME2-compatible compiler and Arm FVP emulator.
7069
- `build-all-containers.sh`, a script to build multi-architecture images.
7170
- `.devcontainer/devcontainer.json` for VS Code container support.
7271

@@ -234,7 +233,7 @@ If you are using Visual Studio Code as your IDE, the container setup is already
234233

235234
Make sure you have the [Microsoft Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension installed.
236235

237-
Then select the **Reopen in Container** menu entry as Figure 1 shows.
236+
Then select the **Reopen in Container** menu entry as shown below.
238237

239238
It automatically finds and uses ``.devcontainer/devcontainer.json``:
240239

@@ -252,7 +251,7 @@ part.
252251

253252
### Devices with native SME2 support
254253

255-
#### Apple devices (by product type)
254+
These Apple devices support SME2 natively.
256255

257256
- iPad
258257

content/learning-paths/cross-platform/multiplying-matrices-with-sme2/2-check-your-environment.md

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,11 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
In this section, you will verify that your environment is set up and ready to
10-
develop with SME2. This will be your first hands-on experience with the
11-
environment.
9+
In this section, you'll verify that your environment is ready for SME2 development. This is your first hands-on task and confirms that the toolchain, hardware (or emulator), and compiler are set up correctly.
1210

13-
## Compile the examples
11+
## Build the code examples
1412

15-
First, build the code examples by running `make`:
13+
Use the `make` command to compile all examples and generate assembly listings:
1614

1715
{{< tabpane code=true >}}
1816
{{< tab header="Native SME2 support" language="bash" output_lines="2-19">}}
@@ -66,6 +64,8 @@ The `make` command performs the following tasks:
6664
- It creates the assembly listings for the four executables: `hello.lst`,
6765
`sme2_check.lst`, `sme2_matmul_asm.lst`, and `sme2_matmul_intr.lst`.
6866

67+
These targets compile and link all example programs and generate disassembly listings for inspection.
68+
6969
At any point, you can clean the directory of all the files that have been built
7070
by invoking `make clean`:
7171

@@ -114,12 +114,20 @@ Run the `hello` program with:
114114
{{< /tab >}}
115115
{{< /tabpane >}}
116116
117-
In the emulated case, you may see that the FVP prints out extra lines. The key confirmation is the presence of "Hello, world!" in the output. it demonstrates that the generic code can be compiled and executed.
117+
In the emulated case, you may see that the FVP prints out extra lines. The key confirmation is the presence of "Hello, world!" in the output. It demonstrates that the generic code can be compiled and executed.
118118
119119
## Check SME2 availability
120120
121121
You will now run the `sme2_check` program, which verifies that SME2 works as expected. This checks both the compiler and the CPU (or the emulated CPU) are properly supporting SME2.
122122
123+
The `sme2_check` program verifies that SME2 is available and working. It confirms:
124+
125+
* The compiler supports SME2 (via __ARM_FEATURE_SME2)
126+
127+
* The system or emulator reports SME2 capability
128+
129+
* Streaming mode works as expected
130+
123131
The source code is found in `sme2_check.c`:
124132
125133
```C { line_numbers="true" }
@@ -191,10 +199,7 @@ The ``sme2_check`` program then displays whether SVE, SME and SME2 are supported
191199
at line 24. The checking of SVE, SME and SME2 is done differently depending on
192200
``BAREMETAL``. This platform specific behaviour is abstracted by the
193201
``display_cpu_features()``:
194-
- In baremetal mode, our program has access to system registers and can thus do
195-
some low level peek at what the silicon actually supports. The program will
196-
print the SVE field of the ``ID_AA64PFR0_EL1`` system register and the SME
197-
field of the ``ID_AA64PFR1_EL1`` system register.
202+
- In baremetal mode, our program has access to system registers and can inspect system registers for SME2 support. The program will print the SVE field of the ``ID_AA64PFR0_EL1`` system register and the SME field of the ``ID_AA64PFR1_EL1`` system register.
198203
- In non baremetal mode, on an Apple platform the program needs to use a higher
199204
level API call.
200205

@@ -213,6 +218,8 @@ annotated with the ``__arm_locally_streaming`` attribute, which instructs the
213218
compiler to automatically switch to streaming mode when invoking this function.
214219
Streaming mode will be discussed in more depth in the next section.
215220

221+
Look for the following confirmation messages in the output:
222+
216223
{{< tabpane code=true >}}
217224
{{< tab header="Native SME2 support" language="bash" output_lines="2-9">}}
218225
./sme2_check
@@ -243,5 +250,4 @@ Streaming mode will be discussed in more depth in the next section.
243250
{{< /tab >}}
244251
{{< /tabpane >}}
245252

246-
You have now checked that the code can be compiled and run with full SME2
247-
support. You are all set to move to the next section.
253+
You've now confirmed that your environment can compile and run SME2 code, and that SME2 features like streaming mode are working correctly. You're ready to continue to the next section and start working with SME2 in practice.
Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Streaming mode and ZA State in SME
2+
title: Streaming mode and ZA state in SME
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## Understanding streaming mode
1010

11-
In large-scale software, programs often switch between streaming and non-streaming mode. Some streaming-mode functions may call others, requiring portions of processor state, such as the ZA storage, to be saved and restored. This behavior is defined in the Arm C Language Extensions (ACLE) and is supported by the compiler.
11+
Programs can switch between streaming and non-streaming mode during execution. When one streaming-mode function calls another, parts of the processor state - such as ZA storage - might need to be saved and restored. This behavior is governed by the Arm C Language Extensions (ACLE) and is managed by the compiler.
1212

1313
To use streaming mode, you simply annotate the relevant functions with the appropriate keywords. The compiler handles the low-level mechanics of streaming mode management, removing the need for error-prone, manual work.
1414

@@ -18,29 +18,28 @@ For more information, see the [Introduction to streaming and non-streaming mode]
1818

1919
## Streaming mode behavior and compiler handling
2020

21+
Streaming mode changes how the processor and compiler manage execution context. Here's how it works:
22+
2123
* The AArch64 architecture defines a concept called *streaming mode*, controlled
2224
by a processor state bit `PSTATE.SM`.
2325

24-
* At any given point in time, the processor is either in streaming mode (`PSTATE.SM==1`) or in non-streaming mode (`PSTATE.SM==0`).
26+
* At any given point in time, the processor is either in streaming mode (`PSTATE.SM == 1`) or in non-streaming mode (`PSTATE.SM == 0`).
2527

2628
* To enter streaming mode, there is the instruction `SMSTART`, and to return to non-streaming mode, the instruction is `SMSTOP`.
2729

2830
* Streaming mode affects C and C++ code in the following ways:
2931

3032
- It can change the length of SVE vectors and predicates. The length of an SVE vector in streaming mode is called the *Streaming Vector Length* (SVL), which might differ from the non-streaming vector length. See [Effect of streaming mode on VL](https://arm-software.github.io/acle/main/acle.html#effect-of-streaming-mode-on-vl) for further information.
31-
- Some instructions, and their associated ACLE intrinsics, can only be executed in streaming mode.These intrinsics are called *streaming intrinsics*.
32-
- Other instructions are restricted to non-streaming mode, and their instrinsics are called *non-streaming intrinsics*.
33+
- Some instructions, and their associated ACLE intrinsics, can only be executed in streaming mode.These are called *streaming intrinsics*.
34+
- Other instructions are restricted to non-streaming mode. These are called *non-streaming intrinsics*.
3335

3436
The ACLE specification extends the C and C++ abstract machine model to include streaming mode. At any given time, the abstract machine is either in streaming or non-streaming mode.
3537

3638
This distinction between abstract machine mode and processor mode is mostly a specification detail. At runtime, the processor’s mode may differ from the abstract machine’s mode - as long as the observable program behavior remains consistent (as per the "as-if" rule).
3739

38-
One
39-
practical consequence of this is that C and C++ code does not specify the exact
40-
placement of `SMSTART` and `SMSTOP` instructions; the source code simply places
41-
limits on where such instructions go. For example, when stepping through a
42-
program in a debugger, the processor mode might sometimes be different from the
43-
one implied by the source code.
40+
{{% notice Note %}}
41+
One practical consequence of this is that C and C++ code does not specify the exact placement of `SMSTART` and `SMSTOP` instructions; the source code simply places limits on where such instructions go. For example, when stepping through a program in a debugger, the processor mode might sometimes be different from the one implied by the source code.
42+
{{% /notice %}}
4443

4544
ACLE provides attributes that specify whether the abstract machine executes statements:
4645

@@ -56,10 +55,11 @@ is enabled.
5655

5756
In C and C++, ZA usage is specified at the function level: a function either uses ZA or it doesn't. That is, a function either has ZA state or it does not.
5857

59-
If a function does have ZA state, the function can either share that ZA state
60-
with the function's caller or create new ZA state. In the latter
61-
case, it is the compiler's responsibility to free up ZA so that the function can
62-
use it - see the description of the lazy saving scheme in
63-
[AAPCS64](https://arm-software.github.io/acle/main/acle.html#AAPCS64) for details
64-
about how the compiler does this.
58+
Functions that use ZA can either:
59+
60+
- Share the caller’s ZA state
61+
- Allocate a new ZA state for themselves
62+
63+
When new state is needed, the compiler is responsible for preserving the caller’s state using a *lazy saving* scheme. For more information, see the [AAPCS64 section of the ACLE spec](https://arm-software.github.io/acle/main/acle.html#AAPCS64).
64+
6565

content/learning-paths/cross-platform/multiplying-matrices-with-sme2/4-vanilla-matmul.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## Overview
1010

11-
In this section, you'll implement a basic matrix multiplication algorithm in C, using a row-major memory layout. This version serves as a reference implementation for validating optimized versions later in the Learning Path.
11+
In this section, you'll implement a basic matrix multiplication algorithm in C using row-major memory layout. This version acts as a reference implementation that you'll use to validate the correctness of optimized versions later in the Learning Path.
1212

1313
## Vanilla matrix multiplication algorithm
1414

@@ -21,6 +21,8 @@ It produces an output matrix C [`Cr` rows x `Cc` columns].
2121

2222
The algorithm works by iterating over each row of A and each column of B. It multiplies the corresponding elements and sums the products to generate each element of matrix C, as shown in the figure below.
2323

24+
The diagram below shows how matrix C is computed by iterating over rows of A and columns of B:
25+
2426
![Standard Matrix Multiplication alt-text#center](matmul.png "Figure 2: Standard matrix multiplication.")
2527

2628
This implies that the A, B, and C matrices have some constraints on their
@@ -34,16 +36,14 @@ properties and use, see this [Wikipedia article on Matrix Multiplication](https:
3436

3537
## Variable mappings in this Learning Path
3638

37-
In this Learning Path, you'll use the following variable names:
39+
The following variable names are used throughout the Learning Path to represent matrix dimensions and operands:
3840

39-
- `matLeft` corresponds to the left-hand side argument of the matrix
40-
multiplication.
41+
- `matLeft` corresponds to the left-hand side argument of the matrix multiplication.
4142
- `matRight`corresponds to the right-hand side of the matrix multiplication.
4243
- `M` is `matLeft` number of rows.
4344
- `K` is `matLeft` number of columns (and `matRight` number of rows).
4445
- `N` is `matRight` number of columns.
45-
- `matResult`corresponds to the result of the matrix multiplication, with
46-
`M` rows and `N` columns.
46+
- `matResult`corresponds to the result of the matrix multiplication, with `M` rows and `N` columns.
4747

4848
## C implementation
4949

0 commit comments

Comments
 (0)