Skip to content

Commit a79e5da

Browse files
Add a new learning path on SIMD Loops.
1 parent 50e97ee commit a79e5da

File tree

8 files changed

+555
-2
lines changed

8 files changed

+555
-2
lines changed

assets/contributors.csv

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,4 +100,5 @@ Ann Cheng,Arm,anncheng-arm,hello-ann,,
100100
Fidel Makatia Omusilibwa,,,,,
101101
Ker Liu,,,,,
102102
Rui Chang,,,,,
103-
103+
Alejandro Martinez Vicente,Arm,,,,
104+
Mohamad Najem,Arm,,,,
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
title: About SIMD Loops
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
Writing high-performance software for Arm processors often involves delving into
10+
its SIMD technologies. For many developers, that journey started with Neon --- a
11+
familiar, fixed-width vector extension that has been around for years. But as
12+
Arm architectures continue to evolve, so do their SIMD technologies.
13+
14+
Enter the world of SVE and SME: two powerful, scalable vector extensions designed for modern
15+
workloads. Unlike Neon, they aren’t just wider --- they’re different. These
16+
extensions introduce new instructions, more flexible programming models, and
17+
support for concepts like predication, scalable vectors, and streaming modes.
18+
However, they also come with a learning curve.
19+
20+
That’s where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) comes
21+
in.
22+
23+
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is designed to help
24+
you in the process of learning how to write SVE and SME code. It is a collection
25+
of self-contained, real-world loop kernels --- written in a mix of C, ACLE
26+
intrinsics, and inline assembly --- that target everything from simple arithmetic
27+
to matrix multiplication, sorting, and string processing. You can compile them,
28+
run them, step through them, and use them as a foundation for your own SIMD
29+
work.
30+
31+
If you’re familiar with Neon intrinsics and would like to explore what SVE and
32+
SME have to offer, the [SIMD
33+
Loops](https://gitlab.arm.com/architecture/simd-loops) project is for you !
34+
35+
## What is SIMD Loops ?
36+
37+
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an open-source
38+
project built to help you learn how to write SIMD code for modern Arm
39+
architectures --- specifically using SVE (Scalable Vector Extension) and SME
40+
(Scalable Matrix Extension). It is designed for programmers who already know
41+
their way around Neon intrinsics but are now facing the more powerful --- and
42+
more complex --- world of SVE and SME.
43+
44+
The goal of SIMD Loops is to provide working, readable examples that demonstrate
45+
how to use the full range of features available in SVE, SVE2, and SME2. Each
46+
example is a self-contained loop kernel --- a small piece of code that performs
47+
a specific task like matrix multiplication, vector reduction, histogram or
48+
memory copy --- and shows how that task can be implemented across different
49+
vector instruction sets.
50+
51+
Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops
52+
takes the opposite approach: it aims to showcase the architecture, not the
53+
problem. The loop kernels are chosen to be realistic and meaningful, but the
54+
main goal is to demonstrate how specific features and instructions work in
55+
practice. If you’re trying to understand scalability, predication,
56+
gather/scatter, streaming mode, ZA storage, compact instructions, or the
57+
mechanics of matrix tiles --- this is where you’ll see them in action.
58+
59+
The project includes:
60+
- Dozens of numbered loop kernels, each focused on a specific feature or pattern
61+
- Reference C implementations to establish expected behavior
62+
- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, and SME2
63+
- Build support for different instruction sets, with runtime validation
64+
- A simple command-line runner to execute any loop interactively
65+
- Optional standalone binaries for bare-metal and simulator use
66+
67+
You don’t need to worry about auto-vectorization, compiler flags, or tooling
68+
quirks. Each loop is hand-written and annotated to make the use of SIMD features
69+
clear. The intent is that you can study, modify, and run each loop as a learning
70+
exercise --- and use the project as a foundation for your own exploration of
71+
Arm’s vector extensions.
72+
73+
## Where to get it?
74+
75+
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is available as an
76+
open-source code licensed under BSD 3-Clause. You can access the source code
77+
from the following GitLab project:
78+
https://gitlab.arm.com/architecture/simd-loops
79+
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: Using SIMD Loops
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
First, clone [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) and
10+
change current directory to it with:
11+
12+
```BASH
13+
git clone https://gitlab.arm.com/architecture/simd-loops simd-loops.git
14+
cd simd-loops.git
15+
```
16+
17+
## SIMD Loops structure
18+
19+
In the [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) project, the
20+
source code for the loops is organized under the loops directory. The complete
21+
list of loops is documented in the loops.inc file, which includes a brief
22+
description and the purpose of each loop. Every loop is associated with a
23+
uniquely named source file following the naming pattern `loop_<NNN>.c`, where
24+
`<NNN>` represents the loop number.
25+
26+
A loop is structured as follows:
27+
28+
```C
29+
// Includes and loop_<NNN>_data structure definition
30+
31+
#if defined(HAVE_xxx_INTRINSICS)
32+
33+
// Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions
34+
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
35+
36+
#elif defined(HAVE_xxx)
37+
38+
// Hand-written inline assembly : xxx = SME2P1, SME2, SVE2P1, SVE2, SVE, or SIMD
39+
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
40+
41+
#else
42+
43+
// Equivalent C code
44+
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
45+
46+
#endif
47+
48+
// Main of loop: Buffers allocations, loop function call, result functional checking
49+
```
50+
51+
Each loop is implemented in several SIMD extension variants, and conditional
52+
compilation is used to select one of the optimisations for the
53+
`inner_loop_<NNN>` function. When ACLE is supported (e.g. SME, SVE, or
54+
SIMD/Neon), a high-level intrinsic implementation is compiled. If ACLE is not
55+
available, the tool falls back to handwritten inline assembly targeting one of
56+
the various SIMD extensions, including SME2.1, SME2, SVE2.1, SVE2, and others.
57+
If no handwritten inline assembly is detected, a fallback implementation in
58+
native C is used. The overall code structure also includes setup and cleanup
59+
code in the main function, where memory buffers are allocated, the selected loop
60+
kernel is executed, and results are verified for correctness.
61+
62+
At compile time, you can select which loop optimisation to compile, whether it
63+
is based on SME or SVE intrinsics, or one of the available inline assembly
64+
variants (`make scalar neon sve2 sme2 sve2p1 sme2p1 sve_intrinsics
65+
sme_intrinsics` ...).
66+
67+
As the result of the build, two types of binaries are generated. The first is a
68+
single executable named `simd_loops`, which includes all the loop
69+
implementations. A specific loop can be selected by passing parameters to the
70+
program (e.g., `simd_loops -k <NNN> -n <iterations>`). The second type consists
71+
of individual standalone binaries, each corresponding to a specific loop.

0 commit comments

Comments
 (0)