|
1 | 1 | --- |
2 | | -title: About SIMD Loops |
| 2 | +title: About single instruction, multiple data (SIMD) loops |
3 | 3 | weight: 3 |
4 | 4 |
|
5 | 5 | ### FIXED, DO NOT MODIFY |
6 | 6 | layout: learningpathall |
7 | 7 | --- |
8 | 8 |
|
9 | 9 | Writing high-performance software for Arm processors often involves delving into |
10 | | -its SIMD technologies. For many developers, that journey started with Neon --- a |
11 | | -familiar, fixed-width vector extension that has been around for years. But as |
| 10 | +SIMD technologies. For many developers, that journey started with NEON, a |
| 11 | +familiar, fixed-width vector extension that has been around for many years. But as |
12 | 12 | Arm architectures continue to evolve, so do their SIMD technologies. |
13 | 13 |
|
14 | | -Enter the world of SVE and SME: two powerful, scalable vector extensions designed for modern |
15 | | -workloads. Unlike Neon, they aren’t just wider --- they’re different. These |
| 14 | +Enter the world of Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME): two powerful, scalable vector extensions designed for modern |
| 15 | +workloads. Unlike NEON, they are not just wider; they are fundamentally different. These |
16 | 16 | extensions introduce new instructions, more flexible programming models, and |
17 | 17 | support for concepts like predication, scalable vectors, and streaming modes. |
18 | 18 | However, they also come with a learning curve. |
19 | 19 |
|
20 | | -That’s where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) comes |
21 | | -in. |
| 20 | +That is where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) becomes a valuable resource, enabling you to quickly and effectively learn how to write high-performance SIMD code. |
22 | 21 |
|
23 | | -[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is designed to help |
24 | | -you in the process of learning how to write SVE and SME code. It is a collection |
25 | | -of self-contained, real-world loop kernels --- written in a mix of C, ACLE |
26 | | -intrinsics, and inline assembly --- that target everything from simple arithmetic |
| 22 | +SIMD Loops is designed to help |
| 23 | +you learn how to write SVE and SME code. It is a collection |
| 24 | +of self-contained, real-world loop kernels written in a mix of C, Arm C Language Extensions (ACLE) |
| 25 | +intrinsics, and inline assembly. These kernels target tasks ranging from simple arithmetic |
27 | 26 | to matrix multiplication, sorting, and string processing. You can compile them, |
28 | 27 | run them, step through them, and use them as a foundation for your own SIMD |
29 | 28 | work. |
30 | 29 |
|
31 | | -If you’re familiar with Neon intrinsics and would like to explore what SVE and |
32 | | -SME have to offer, the [SIMD |
33 | | -Loops](https://gitlab.arm.com/architecture/simd-loops) project is for you ! |
| 30 | +If you are familiar with NEON intrinsics, you can use SIMD Loops to learn and explore SVE and SME. |
34 | 31 |
|
35 | | -## What is SIMD Loops ? |
| 32 | +## What is SIMD Loops? |
36 | 33 |
|
37 | | -[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an open-source |
38 | | -project built to help you learn how to write SIMD code for modern Arm |
39 | | -architectures --- specifically using SVE (Scalable Vector Extension) and SME |
40 | | -(Scalable Matrix Extension). It is designed for programmers who already know |
41 | | -their way around Neon intrinsics but are now facing the more powerful --- and |
42 | | -more complex --- world of SVE and SME. |
| 34 | +SIMD Loops is an open-source |
| 35 | +project, licensed under BSD 3-Clause, built to help you learn how to write SIMD code for modern Arm |
| 36 | +architectures, specifically using SVE and SME. |
| 37 | +It is designed for programmers who already know |
| 38 | +their way around NEON intrinsics but are now facing the more powerful and |
| 39 | +complex world of SVE and SME. |
43 | 40 |
|
44 | 41 | The goal of SIMD Loops is to provide working, readable examples that demonstrate |
45 | 42 | how to use the full range of features available in SVE, SVE2, and SME2. Each |
46 | | -example is a self-contained loop kernel --- a small piece of code that performs |
47 | | -a specific task like matrix multiplication, vector reduction, histogram or |
48 | | -memory copy --- and shows how that task can be implemented across different |
| 43 | +example is a self-contained loop kernel, a small piece of code that performs |
| 44 | +a specific task like matrix multiplication, vector reduction, histogram, or |
| 45 | +memory copy. These examples show how that task can be implemented across different |
49 | 46 | vector instruction sets. |
50 | 47 |
|
51 | 48 | Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops |
52 | | -takes the opposite approach: it aims to showcase the architecture, not the |
53 | | -problem. The loop kernels are chosen to be realistic and meaningful, but the |
| 49 | +takes the opposite approach. It aims to showcase the architecture rather than |
| 50 | +the problem. The loop kernels are chosen to be realistic and meaningful, but the |
54 | 51 | main goal is to demonstrate how specific features and instructions work in |
55 | | -practice. If you’re trying to understand scalability, predication, |
| 52 | +practice. If you are trying to understand scalability, predication, |
56 | 53 | gather/scatter, streaming mode, ZA storage, compact instructions, or the |
57 | | -mechanics of matrix tiles --- this is where you’ll see them in action. |
| 54 | +mechanics of matrix tiles, this is where you will see them in action. |
58 | 55 |
|
59 | 56 | The project includes: |
60 | 57 | - Dozens of numbered loop kernels, each focused on a specific feature or pattern |
61 | 58 | - Reference C implementations to establish expected behavior |
62 | | -- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, SVE2.1, SME2 and SME2.1 |
| 59 | +- Inline assembly and/or intrinsics for scalar, NEON, SVE, SVE2, SVE2.1, SME2, and SME2.1 |
63 | 60 | - Build support for different instruction sets, with runtime validation |
64 | 61 | - A simple command-line runner to execute any loop interactively |
65 | 62 | - Optional standalone binaries for bare-metal and simulator use |
66 | 63 |
|
67 | | -You don’t need to worry about auto-vectorization, compiler flags, or tooling |
| 64 | +You do not need to worry about auto-vectorization, compiler flags, or tooling |
68 | 65 | quirks. Each loop is hand-written and annotated to make the use of SIMD features |
69 | 66 | clear. The intent is that you can study, modify, and run each loop as a learning |
70 | | -exercise --- and use the project as a foundation for your own exploration of |
| 67 | +exercise, and use the project as a foundation for your own exploration of |
71 | 68 | Arm’s vector extensions. |
72 | 69 |
|
73 | | -## Where to get it? |
74 | | - |
75 | | -[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is available as an |
76 | | -open-source code licensed under BSD 3-Clause. You can access the source code |
77 | | -from the following GitLab project: |
78 | | -https://gitlab.arm.com/architecture/simd-loops |
79 | 70 |
|
0 commit comments