ArmDeveloperEcosystem
diff --git a/‎assets/contributors.csv‎
Lines changed: 2 additions & 1 deletion b/‎assets/contributors.csv‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎content/learning-paths/cross-platform/simd-loops/1-about.md‎
Lines changed: 79 additions & 0 deletions b/‎content/learning-paths/cross-platform/simd-loops/1-about.md‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-loops/2-using.md‎
Lines changed: 71 additions & 0 deletions b/‎content/learning-paths/cross-platform/simd-loops/2-using.md‎
Lines changed: 71 additions & 0 deletions
@@ -100,4 +100,5 @@ Ann Cheng,Arm,anncheng-arm,hello-ann,,
 Fidel Makatia Omusilibwa,,,,,
 Ker Liu,,,,,
 Rui Chang,,,,,
-
+Alejandro Martinez Vicente,Arm,,,,
+Mohamad Najem,Arm,,,,
@@ -0,0 +1,79 @@
+---
+title: About SIMD Loops
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+Writing high-performance software for Arm processors often involves delving into
+its SIMD technologies. For many developers, that journey started with Neon --- a
+familiar, fixed-width vector extension that has been around for years. But as
+Arm architectures continue to evolve, so do their SIMD technologies.
+
+Enter the world of SVE and SME: two powerful, scalable vector extensions designed for modern
+workloads. Unlike Neon, they aren’t just wider --- they’re different. These
+extensions introduce new instructions, more flexible programming models, and
+support for concepts like predication, scalable vectors, and streaming modes.
+However, they also come with a learning curve.
+
+That’s where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) comes
+in.
+
+[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is designed to help
+you in the process of learning how to write SVE and SME code. It is a collection
+of self-contained, real-world loop kernels --- written in a mix of C, ACLE
+intrinsics, and inline assembly --- that target everything from simple arithmetic
+to matrix multiplication, sorting, and string processing. You can compile them,
+run them, step through them, and use them as a foundation for your own SIMD
+work.
+
+If you’re familiar with Neon intrinsics and would like to explore what SVE and
+SME have to offer, the [SIMD
+Loops](https://gitlab.arm.com/architecture/simd-loops) project is for you !
+
+## What is SIMD Loops ?
+
+[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an open-source
+project built to help you learn how to write SIMD code for modern Arm
+architectures --- specifically using SVE (Scalable Vector Extension) and SME
+(Scalable Matrix Extension). It is designed for programmers who already know
+their way around Neon intrinsics but are now facing the more powerful --- and
+more complex --- world of SVE and SME.
+
+The goal of SIMD Loops is to provide working, readable examples that demonstrate
+how to use the full range of features available in SVE, SVE2, and SME2. Each
+example is a self-contained loop kernel --- a small piece of code that performs
+a specific task like matrix multiplication, vector reduction, histogram or
+memory copy --- and shows how that task can be implemented across different
+vector instruction sets.
+
+Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops
+takes the opposite approach: it aims to showcase the architecture, not the
+problem. The loop kernels are chosen to be realistic and meaningful, but the
+main goal is to demonstrate how specific features and instructions work in
+practice. If you’re trying to understand scalability, predication,
+gather/scatter, streaming mode, ZA storage, compact instructions, or the
+mechanics of matrix tiles --- this is where you’ll see them in action.
+
+The project includes:
+- Dozens of numbered loop kernels, each focused on a specific feature or pattern
+- Reference C implementations to establish expected behavior
+- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, and SME2
+- Build support for different instruction sets, with runtime validation
+- A simple command-line runner to execute any loop interactively
+- Optional standalone binaries for bare-metal and simulator use
+
+You don’t need to worry about auto-vectorization, compiler flags, or tooling
+quirks. Each loop is hand-written and annotated to make the use of SIMD features
+clear. The intent is that you can study, modify, and run each loop as a learning
+exercise --- and use the project as a foundation for your own exploration of
+Arm’s vector extensions.
+
+## Where to get it?
+
+[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is available as an
+open-source code licensed under BSD 3-Clause. You can access the source code
+from the following GitLab project:
+https://gitlab.arm.com/architecture/simd-loops
+
@@ -0,0 +1,71 @@
+---
+title: Using SIMD Loops
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+First, clone [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) and
+change current directory to it with:
+
+```BASH
+git clone https://gitlab.arm.com/architecture/simd-loops simd-loops.git
+cd simd-loops.git
+```
+
+## SIMD Loops structure
+
+In the [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) project, the
+source code for the loops is organized under the loops directory. The complete
+list of loops is documented in the loops.inc file, which includes a brief
+description and the purpose of each loop. Every loop is associated with a
+uniquely named source file following the naming pattern `loop_<NNN>.c`, where
+`<NNN>`  represents the loop number.
+
+A loop is structured as follows:
+
+```C
+// Includes and loop_<NNN>_data structure definition
+
+#if defined(HAVE_xxx_INTRINSICS)
+
+// Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions
+void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
+
+#elif defined(HAVE_xxx)
+
+// Hand-written inline assembly : xxx = SME2P1, SME2, SVE2P1, SVE2, SVE, or SIMD
+void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
+
+#else
+
+// Equivalent C code
+void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
+
+#endif
+
+// Main of loop: Buffers allocations, loop function call, result functional checking
+```
+
+Each loop is implemented in several SIMD extension variants, and conditional
+compilation is used to select one of the optimisations for the
+`inner_loop_<NNN>` function. When ACLE is supported (e.g. SME, SVE, or
+SIMD/Neon), a high-level intrinsic implementation is compiled. If ACLE is not
+available, the tool falls back to handwritten inline assembly targeting one of
+the various SIMD extensions, including SME2.1, SME2, SVE2.1, SVE2, and others.
+If no handwritten inline assembly is detected, a fallback implementation in
+native C is used. The overall code structure also includes setup and cleanup
+code in the main function, where memory buffers are allocated, the selected loop
+kernel is executed, and results are verified for correctness.
+
+At compile time, you can select which loop optimisation to compile, whether it
+is based on SME or SVE intrinsics, or one of the available inline assembly
+variants (`make scalar neon sve2 sme2 sve2p1 sme2p1 sve_intrinsics
+sme_intrinsics` ...).
+
+As the result of the build, two types of binaries are generated. The first is a
+single executable named `simd_loops`, which includes all the loop
+implementations. A specific loop can be selected by passing parameters to the
+program (e.g., `simd_loops -k <NNN> -n <iterations>`). The second type consists
+of individual standalone binaries, each corresponding to a specific loop.