Skip to content

Commit e0d6932

Browse files
Merge pull request #2281 from Arnaud-de-Grandmaison-ARM/simd-loops
Add a new learning path on SIMD Loops.
2 parents 4b46054 + d81d700 commit e0d6932

File tree

8 files changed

+562
-2
lines changed

8 files changed

+562
-2
lines changed

assets/contributors.csv

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,4 +100,5 @@ Ann Cheng,Arm,anncheng-arm,hello-ann,,
100100
Fidel Makatia Omusilibwa,,,,,
101101
Ker Liu,,,,,
102102
Rui Chang,,,,,
103-
103+
Alejandro Martinez Vicente,Arm,,,,
104+
Mohamad Najem,Arm,,,,
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
title: About SIMD Loops
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
Writing high-performance software for Arm processors often involves delving into
10+
its SIMD technologies. For many developers, that journey started with Neon --- a
11+
familiar, fixed-width vector extension that has been around for years. But as
12+
Arm architectures continue to evolve, so do their SIMD technologies.
13+
14+
Enter the world of SVE and SME: two powerful, scalable vector extensions designed for modern
15+
workloads. Unlike Neon, they aren’t just wider --- they’re different. These
16+
extensions introduce new instructions, more flexible programming models, and
17+
support for concepts like predication, scalable vectors, and streaming modes.
18+
However, they also come with a learning curve.
19+
20+
That’s where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) comes
21+
in.
22+
23+
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is designed to help
24+
you in the process of learning how to write SVE and SME code. It is a collection
25+
of self-contained, real-world loop kernels --- written in a mix of C, ACLE
26+
intrinsics, and inline assembly --- that target everything from simple arithmetic
27+
to matrix multiplication, sorting, and string processing. You can compile them,
28+
run them, step through them, and use them as a foundation for your own SIMD
29+
work.
30+
31+
If you’re familiar with Neon intrinsics and would like to explore what SVE and
32+
SME have to offer, the [SIMD
33+
Loops](https://gitlab.arm.com/architecture/simd-loops) project is for you !
34+
35+
## What is SIMD Loops ?
36+
37+
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an open-source
38+
project built to help you learn how to write SIMD code for modern Arm
39+
architectures --- specifically using SVE (Scalable Vector Extension) and SME
40+
(Scalable Matrix Extension). It is designed for programmers who already know
41+
their way around Neon intrinsics but are now facing the more powerful --- and
42+
more complex --- world of SVE and SME.
43+
44+
The goal of SIMD Loops is to provide working, readable examples that demonstrate
45+
how to use the full range of features available in SVE, SVE2, and SME2. Each
46+
example is a self-contained loop kernel --- a small piece of code that performs
47+
a specific task like matrix multiplication, vector reduction, histogram or
48+
memory copy --- and shows how that task can be implemented across different
49+
vector instruction sets.
50+
51+
Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops
52+
takes the opposite approach: it aims to showcase the architecture, not the
53+
problem. The loop kernels are chosen to be realistic and meaningful, but the
54+
main goal is to demonstrate how specific features and instructions work in
55+
practice. If you’re trying to understand scalability, predication,
56+
gather/scatter, streaming mode, ZA storage, compact instructions, or the
57+
mechanics of matrix tiles --- this is where you’ll see them in action.
58+
59+
The project includes:
60+
- Dozens of numbered loop kernels, each focused on a specific feature or pattern
61+
- Reference C implementations to establish expected behavior
62+
- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, SVE2.1, SME2 and SME2.1
63+
- Build support for different instruction sets, with runtime validation
64+
- A simple command-line runner to execute any loop interactively
65+
- Optional standalone binaries for bare-metal and simulator use
66+
67+
You don’t need to worry about auto-vectorization, compiler flags, or tooling
68+
quirks. Each loop is hand-written and annotated to make the use of SIMD features
69+
clear. The intent is that you can study, modify, and run each loop as a learning
70+
exercise --- and use the project as a foundation for your own exploration of
71+
Arm’s vector extensions.
72+
73+
## Where to get it?
74+
75+
[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is available as an
76+
open-source code licensed under BSD 3-Clause. You can access the source code
77+
from the following GitLab project:
78+
https://gitlab.arm.com/architecture/simd-loops
79+
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: Using SIMD Loops
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
First, clone [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) and
10+
change current directory to it with:
11+
12+
```BASH
13+
git clone https://gitlab.arm.com/architecture/simd-loops simd-loops.git
14+
cd simd-loops.git
15+
```
16+
17+
## SIMD Loops structure
18+
19+
In the [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) project, the
20+
source code for the loops is organized under the loops directory. The complete
21+
list of loops is documented in the loops.inc file, which includes a brief
22+
description and the purpose of each loop. Every loop is associated with a
23+
uniquely named source file following the naming pattern `loop_<NNN>.c`, where
24+
`<NNN>` represents the loop number.
25+
26+
A loop is structured as follows:
27+
28+
```C
29+
// Includes and loop_<NNN>_data structure definition
30+
31+
#if defined(HAVE_NATIVE) || defined(HAVE_AUTOVEC)
32+
33+
// C code
34+
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
35+
36+
#if defined(HAVE_xxx_INTRINSICS)
37+
38+
// Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions
39+
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
40+
41+
#elif defined(<ASM_COND>)
42+
43+
// Hand-written inline assembly :
44+
// <ASM_COND> = __ARM_FEATURE_SME2p1, __ARM_FEATURE_SME2, __ARM_FEATURE_SVE2p1,
45+
// __ARM_FEATURE_SVE2, __ARM_FEATURE_SVE, or __ARM_NEON
46+
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
47+
48+
#else
49+
50+
#error "No implementations available for this target."
51+
52+
#endif
53+
54+
// Main of loop: Buffers allocations, loop function call, result functional checking
55+
```
56+
57+
Each loop is implemented in several SIMD extension variants, and conditional
58+
compilation is used to select one of the optimisations for the
59+
`inner_loop_<NNN>` function. The native C implementation is written first, and
60+
it can be generated either when building natively (HAVE_NATIVE) or through
61+
compiler auto-vectorization (HAVE_AUTOVEC). When SIMD ACLE is supported (e.g.,
62+
SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE
63+
support is not available, the build process falls back to handwritten inline
64+
assembly targeting one of the available SIMD extensions, such as SME2.1, SME2,
65+
SVE2.1, SVE2, and others. The overall code structure also includes setup and
66+
cleanup code in the main function, where memory buffers are allocated, the
67+
selected loop kernel is executed, and results are verified for correctness.
68+
69+
At compile time, you can select which loop optimisation to compile, whether it
70+
is based on SME or SVE intrinsics, or one of the available inline assembly
71+
variants (`make scalar neon sve2 sme2 sve2p1 sme2p1 sve_intrinsics
72+
sme_intrinsics` ...).
73+
74+
As the result of the build, two types of binaries are generated. The first is a
75+
single executable named `simd_loops`, which includes all the loop
76+
implementations. A specific loop can be selected by passing parameters to the
77+
program (e.g., `simd_loops -k <NNN> -n <iterations>`). The second type consists
78+
of individual standalone binaries, each corresponding to a specific loop.

0 commit comments

Comments
 (0)