Skip to content

Commit d1905f5

Browse files
committed
First review of SIMD Loops, waiting for project release to continue reviewing.
1 parent 405b4d7 commit d1905f5

File tree

5 files changed

+54
-59
lines changed

5 files changed

+54
-59
lines changed
Lines changed: 28 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,70 @@
11
---
2-
title: About SIMD Loops
2+
title: About single instruction, multiple data (SIMD) loops
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

99
Writing high-performance software for Arm processors often involves delving into
10-
its SIMD technologies. For many developers, that journey started with Neon --- a
11-
familiar, fixed-width vector extension that has been around for years. But as
10+
SIMD technologies. For many developers, that journey started with NEON, a
11+
familiar, fixed-width vector extension that has been around for many years. But as
1212
Arm architectures continue to evolve, so do their SIMD technologies.
1313

14-
Enter the world of SVE and SME: two powerful, scalable vector extensions designed for modern
15-
workloads. Unlike Neon, they aren’t just wider --- they’re different. These
14+
Enter the world of Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME): two powerful, scalable vector extensions designed for modern
15+
workloads. Unlike NEON, they are not just wider; they are fundamentally different. These
1616
extensions introduce new instructions, more flexible programming models, and
1717
support for concepts like predication, scalable vectors, and streaming modes.
1818
However, they also come with a learning curve.
1919

20-
That’s where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) comes
21-
in.
20+
That is where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) becomes a valuable resource, enabling you to quickly and effectively learn how to write high-performance SIMD code.
2221

23-
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is designed to help
24-
you in the process of learning how to write SVE and SME code. It is a collection
25-
of self-contained, real-world loop kernels --- written in a mix of C, ACLE
26-
intrinsics, and inline assembly --- that target everything from simple arithmetic
22+
SIMD Loops is designed to help
23+
you learn how to write SVE and SME code. It is a collection
24+
of self-contained, real-world loop kernels written in a mix of C, Arm C Language Extensions (ACLE)
25+
intrinsics, and inline assembly. These kernels target tasks ranging from simple arithmetic
2726
to matrix multiplication, sorting, and string processing. You can compile them,
2827
run them, step through them, and use them as a foundation for your own SIMD
2928
work.
3029

31-
If you’re familiar with Neon intrinsics and would like to explore what SVE and
32-
SME have to offer, the [SIMD
33-
Loops](https://gitlab.arm.com/architecture/simd-loops) project is for you !
30+
If you are familiar with NEON intrinsics, you can use SIMD Loops to learn and explore SVE and SME.
3431

35-
## What is SIMD Loops ?
32+
## What is SIMD Loops?
3633

37-
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an open-source
38-
project built to help you learn how to write SIMD code for modern Arm
39-
architectures --- specifically using SVE (Scalable Vector Extension) and SME
40-
(Scalable Matrix Extension). It is designed for programmers who already know
41-
their way around Neon intrinsics but are now facing the more powerful --- and
42-
more complex --- world of SVE and SME.
34+
SIMD Loops is an open-source
35+
project, licensed under BSD 3-Clause, built to help you learn how to write SIMD code for modern Arm
36+
architectures, specifically using SVE and SME.
37+
It is designed for programmers who already know
38+
their way around NEON intrinsics but are now facing the more powerful and
39+
complex world of SVE and SME.
4340

4441
The goal of SIMD Loops is to provide working, readable examples that demonstrate
4542
how to use the full range of features available in SVE, SVE2, and SME2. Each
46-
example is a self-contained loop kernel --- a small piece of code that performs
47-
a specific task like matrix multiplication, vector reduction, histogram or
48-
memory copy --- and shows how that task can be implemented across different
43+
example is a self-contained loop kernel, a small piece of code that performs
44+
a specific task like matrix multiplication, vector reduction, histogram, or
45+
memory copy. These examples show how that task can be implemented across different
4946
vector instruction sets.
5047

5148
Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops
52-
takes the opposite approach: it aims to showcase the architecture, not the
53-
problem. The loop kernels are chosen to be realistic and meaningful, but the
49+
takes the opposite approach. It aims to showcase the architecture rather than
50+
the problem. The loop kernels are chosen to be realistic and meaningful, but the
5451
main goal is to demonstrate how specific features and instructions work in
55-
practice. If you’re trying to understand scalability, predication,
52+
practice. If you are trying to understand scalability, predication,
5653
gather/scatter, streaming mode, ZA storage, compact instructions, or the
57-
mechanics of matrix tiles --- this is where you’ll see them in action.
54+
mechanics of matrix tiles, this is where you will see them in action.
5855

5956
The project includes:
6057
- Dozens of numbered loop kernels, each focused on a specific feature or pattern
6158
- Reference C implementations to establish expected behavior
62-
- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, SVE2.1, SME2 and SME2.1
59+
- Inline assembly and/or intrinsics for scalar, NEON, SVE, SVE2, SVE2.1, SME2, and SME2.1
6360
- Build support for different instruction sets, with runtime validation
6461
- A simple command-line runner to execute any loop interactively
6562
- Optional standalone binaries for bare-metal and simulator use
6663

67-
You don’t need to worry about auto-vectorization, compiler flags, or tooling
64+
You do not need to worry about auto-vectorization, compiler flags, or tooling
6865
quirks. Each loop is hand-written and annotated to make the use of SIMD features
6966
clear. The intent is that you can study, modify, and run each loop as a learning
70-
exercise --- and use the project as a foundation for your own exploration of
67+
exercise, and use the project as a foundation for your own exploration of
7168
Arm’s vector extensions.
7269

73-
## Where to get it?
74-
75-
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is available as an
76-
open-source code licensed under BSD 3-Clause. You can access the source code
77-
from the following GitLab project:
78-
https://gitlab.arm.com/architecture/simd-loops
7970

content/learning-paths/cross-platform/simd-loops/2-using.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,16 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
First, clone [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) and
10-
change current directory to it with:
9+
To get started, clone the SIMD Loops project and change current directory:
1110

12-
```BASH
11+
```bash
1312
git clone https://gitlab.arm.com/architecture/simd-loops simd-loops.git
1413
cd simd-loops.git
1514
```
1615

1716
## SIMD Loops structure
1817

19-
In the [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) project, the
18+
In the SIMD Loops project, the
2019
source code for the loops is organized under the loops directory. The complete
2120
list of loops is documented in the loops.inc file, which includes a brief
2221
description and the purpose of each loop. Every loop is associated with a
@@ -35,7 +34,7 @@ void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
3534

3635
#if defined(HAVE_xxx_INTRINSICS)
3736

38-
// Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions
37+
// Intrinsics versions: xxx = SME, SVE, or SIMD (NEON) versions
3938
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
4039

4140
#elif defined(<ASM_COND>)
@@ -55,18 +54,18 @@ void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
5554
```
5655

5756
Each loop is implemented in several SIMD extension variants, and conditional
58-
compilation is used to select one of the optimisations for the
57+
compilation is used to select one of the optimizations for the
5958
`inner_loop_<NNN>` function. The native C implementation is written first, and
6059
it can be generated either when building natively (HAVE_NATIVE) or through
6160
compiler auto-vectorization (HAVE_AUTOVEC). When SIMD ACLE is supported (e.g.,
62-
SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE
61+
SME, SVE, or NEON), the code is compiled using high-level intrinsics. If ACLE
6362
support is not available, the build process falls back to handwritten inline
6463
assembly targeting one of the available SIMD extensions, such as SME2.1, SME2,
6564
SVE2.1, SVE2, and others. The overall code structure also includes setup and
6665
cleanup code in the main function, where memory buffers are allocated, the
6766
selected loop kernel is executed, and results are verified for correctness.
6867

69-
At compile time, you can select which loop optimisation to compile, whether it
68+
At compile time, you can select which loop optimization to compile, whether it
7069
is based on SME or SVE intrinsics, or one of the available inline assembly
7170
variants (`make scalar neon sve2 sme2 sve2p1 sme2p1 sve_intrinsics
7271
sme_intrinsics` ...).

content/learning-paths/cross-platform/simd-loops/3-example.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Example
2+
title: Code example
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
@@ -263,7 +263,7 @@ Guide](https://developer.arm.com/documentation/109246/latest/).
263263
Beyond the SME2 and SVE2 implementations shown above, this loop also includes several
264264
alternative optimized versions, each leveraging architecture-specific features.
265265
266-
### Neon
266+
### NEON
267267
268268
The neon version (lines 612-710) relies on multiple structure load/store
269269
combined with indexed `fmla` instructions to vectorize the matrix multiplication

content/learning-paths/cross-platform/simd-loops/4-conclusion.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,23 +6,21 @@ weight: 6
66
layout: learningpathall
77
---
88

9-
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an invaluable
9+
SIMD Loops is an invaluable
1010
resource for developers looking to learn or master the intricacies of SVE and
1111
SME on modern Arm architectures. By providing practical, hands-on examples, it
1212
bridges the gap between the architecture specification and real-world
13-
application. Whether you're transitioning from Neon or starting fresh with SVE
13+
application. Whether you're transitioning from NEON or starting fresh with SVE
1414
and SME, SIMD Loops offers a comprehensive toolkit to enhance your understanding
1515
and proficiency.
1616

1717
With its extensive collection of loop kernels, detailed documentation, and
18-
flexible build options, [SIMD
19-
Loops](https://gitlab.arm.com/architecture/simd-loops) empowers you to explore
18+
flexible build options, SIMD Loops empowers you to explore
2019
and leverage the full potential of Arm's advanced vector extensions. Dive into
2120
the project, experiment with the examples, and take your high-performance coding
2221
skills for Arm to the next level.
2322

24-
For more information and to get started, visit the [SIMD
25-
Loops](https://gitlab.arm.com/architecture/simd-loops) GitLab project and refer
23+
For more information and to get started, visit the GitLab project and refer
2624
to the
2725
[README.md](https://gitlab.arm.com/architecture/simd-loops/-/blob/main/README.md)
28-
for detailed instructions on building and running the code. Happy coding!
26+
for instructions on building and running the code.

content/learning-paths/cross-platform/simd-loops/_index.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
2-
title: "Code kata: perfect your SVE and SME instructions skills with SIMD Loops"
2+
title: "Code kata: perfect your SVE and SME skills with SIMD Loops"
33

44
minutes_to_complete: 30
55

66
draft: true
77
cascade:
88
draft: true
99

10-
who_is_this_for: This is an advanced topic for software developers who want to learn how to use the full range of features available in SVE, SVE2 and SME2 to improve the performance of their software for Arm processors.
10+
who_is_this_for: This is an advanced topic for software developers who want to learn how to use the full range of features available in SVE, SVE2 and SME2 to improve software performance on Arm processors.
1111

1212
learning_objectives:
13-
- Improve your writing of SIMD code with SVE and SME.
13+
- Improve SIMD code performance using Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME).
1414

1515
prerequisites:
16-
- An AArch64 computer running Linux or macOS. You can use cloud instances, see this list of [Arm cloud service providers](/learning-paths/servers-and-cloud-computing/csp/).
17-
- Some familiarity of SIMD programming and Neon intrinsics
16+
- An AArch64 computer running Linux or macOS. You can use cloud instances, refer to [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/) for a list of cloud service providers.
17+
- Some familiarity with SIMD programming and NEON intrinsics.
1818

1919
author:
2020
- Alejandro Martinez Vicente
@@ -33,6 +33,13 @@ tools_software_languages:
3333
- Clang
3434
- FVP
3535

36+
shared_path: true
37+
shared_between:
38+
- servers-and-cloud-computing
39+
- laptops-and-desktops
40+
- mobile-graphics-and-gaming
41+
- automotive
42+
3643
further_reading:
3744
- resource:
3845
title: SVE Programming Examples

0 commit comments

Comments
 (0)