First review of SIMD Loops, waiting for project release to continue reviewing.

jasonrandrews · jasonrandrews · commit d1905f5b1f20 · 2025-09-10T16:38:06.000-05:00
diff --git a/content/learning-paths/cross-platform/simd-loops/1-about.md b/content/learning-paths/cross-platform/simd-loops/1-about.md
@@ -1,79 +1,70 @@
 ---
-title: About SIMD Loops
+title: About single instruction, multiple data (SIMD) loops
 weight: 3
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
 Writing high-performance software for Arm processors often involves delving into
-its SIMD technologies. For many developers, that journey started with Neon --- a
-familiar, fixed-width vector extension that has been around for years. But as
+SIMD technologies. For many developers, that journey started with NEON, a
+familiar, fixed-width vector extension that has been around for many years. But as
 Arm architectures continue to evolve, so do their SIMD technologies.
 
-Enter the world of SVE and SME: two powerful, scalable vector extensions designed for modern
-workloads. Unlike Neon, they aren’t just wider --- they’re different. These
+Enter the world of Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME): two powerful, scalable vector extensions designed for modern
+workloads. Unlike NEON, they are not just wider; they are fundamentally different. These
 extensions introduce new instructions, more flexible programming models, and
 support for concepts like predication, scalable vectors, and streaming modes.
 However, they also come with a learning curve.
 
-That’s where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) comes
-in.
+That is where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) becomes a valuable resource, enabling you to quickly and effectively learn how to write high-performance SIMD code.
 
-[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is designed to help
-you in the process of learning how to write SVE and SME code. It is a collection
-of self-contained, real-world loop kernels --- written in a mix of C, ACLE
-intrinsics, and inline assembly --- that target everything from simple arithmetic
+SIMD Loops is designed to help
+you learn how to write SVE and SME code. It is a collection
+of self-contained, real-world loop kernels written in a mix of C, Arm C Language Extensions (ACLE)
+intrinsics, and inline assembly. These kernels target tasks ranging from simple arithmetic
 to matrix multiplication, sorting, and string processing. You can compile them,
 run them, step through them, and use them as a foundation for your own SIMD
 work.
 
-If you’re familiar with Neon intrinsics and would like to explore what SVE and
-SME have to offer, the [SIMD
-Loops](https://gitlab.arm.com/architecture/simd-loops) project is for you !
+If you are familiar with NEON intrinsics, you can use SIMD Loops to learn and explore SVE and SME.
 
-## What is SIMD Loops ?
+## What is SIMD Loops?
 
-[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an open-source
-project built to help you learn how to write SIMD code for modern Arm
-architectures --- specifically using SVE (Scalable Vector Extension) and SME
-(Scalable Matrix Extension). It is designed for programmers who already know
-their way around Neon intrinsics but are now facing the more powerful --- and
-more complex --- world of SVE and SME.
+SIMD Loops is an open-source
+project, licensed under BSD 3-Clause, built to help you learn how to write SIMD code for modern Arm
+architectures, specifically using SVE and SME.
+It is designed for programmers who already know
+their way around NEON intrinsics but are now facing the more powerful and
+complex world of SVE and SME.
 
 The goal of SIMD Loops is to provide working, readable examples that demonstrate
 how to use the full range of features available in SVE, SVE2, and SME2. Each
-example is a self-contained loop kernel --- a small piece of code that performs
-a specific task like matrix multiplication, vector reduction, histogram or
-memory copy --- and shows how that task can be implemented across different
+example is a self-contained loop kernel, a small piece of code that performs
+a specific task like matrix multiplication, vector reduction, histogram, or
+memory copy. These examples show how that task can be implemented across different
 vector instruction sets.
 
 Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops
-takes the opposite approach: it aims to showcase the architecture, not the
-problem. The loop kernels are chosen to be realistic and meaningful, but the
+takes the opposite approach. It aims to showcase the architecture rather than
+the problem. The loop kernels are chosen to be realistic and meaningful, but the
 main goal is to demonstrate how specific features and instructions work in
-practice. If you’re trying to understand scalability, predication,
+practice. If you are trying to understand scalability, predication,
 gather/scatter, streaming mode, ZA storage, compact instructions, or the
-mechanics of matrix tiles --- this is where you’ll see them in action.
+mechanics of matrix tiles, this is where you will see them in action.
 
 The project includes:
 - Dozens of numbered loop kernels, each focused on a specific feature or pattern
 - Reference C implementations to establish expected behavior
-- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, SVE2.1, SME2 and SME2.1
+- Inline assembly and/or intrinsics for scalar, NEON, SVE, SVE2, SVE2.1, SME2, and SME2.1
 - Build support for different instruction sets, with runtime validation
 - A simple command-line runner to execute any loop interactively
 - Optional standalone binaries for bare-metal and simulator use
 
-You don’t need to worry about auto-vectorization, compiler flags, or tooling
+You do not need to worry about auto-vectorization, compiler flags, or tooling
 quirks. Each loop is hand-written and annotated to make the use of SIMD features
 clear. The intent is that you can study, modify, and run each loop as a learning
-exercise --- and use the project as a foundation for your own exploration of
+exercise, and use the project as a foundation for your own exploration of
 Arm’s vector extensions.
 
-## Where to get it?
-
-[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is available as an
-open-source code licensed under BSD 3-Clause. You can access the source code
-from the following GitLab project:
-https://gitlab.arm.com/architecture/simd-loops
 
diff --git a/content/learning-paths/cross-platform/simd-loops/2-using.md b/content/learning-paths/cross-platform/simd-loops/2-using.md
@@ -6,17 +6,16 @@ weight: 4
 layout: learningpathall
 ---
 
-First, clone [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) and
-change current directory to it with:
+To get started, clone the SIMD Loops project and change current directory:
 
-```BASH
+```bash
 git clone https://gitlab.arm.com/architecture/simd-loops simd-loops.git
 cd simd-loops.git
 ```
 
 ## SIMD Loops structure
 
-In the [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) project, the
+In the SIMD Loops project, the
 source code for the loops is organized under the loops directory. The complete
 list of loops is documented in the loops.inc file, which includes a brief
 description and the purpose of each loop. Every loop is associated with a
@@ -35,7 +34,7 @@ void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
 
 #if defined(HAVE_xxx_INTRINSICS)
 
-// Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions
+// Intrinsics versions: xxx = SME, SVE, or SIMD (NEON) versions
 void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
 
 #elif defined(<ASM_COND>)
@@ -55,18 +54,18 @@ void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
 ```
 
 Each loop is implemented in several SIMD extension variants, and conditional
-compilation is used to select one of the optimisations for the
+compilation is used to select one of the optimizations for the
 `inner_loop_<NNN>` function. The native C implementation is written first, and
 it can be generated either when building natively (HAVE_NATIVE) or through
 compiler auto-vectorization (HAVE_AUTOVEC). When SIMD ACLE is supported (e.g.,
-SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE
+SME, SVE, or NEON), the code is compiled using high-level intrinsics. If ACLE
 support is not available, the build process falls back to handwritten inline
 assembly targeting one of the available SIMD extensions, such as SME2.1, SME2,
 SVE2.1, SVE2, and others. The overall code structure also includes setup and
 cleanup code in the main function, where memory buffers are allocated, the
 selected loop kernel is executed, and results are verified for correctness.
 
-At compile time, you can select which loop optimisation to compile, whether it
+At compile time, you can select which loop optimization to compile, whether it
 is based on SME or SVE intrinsics, or one of the available inline assembly
 variants (`make scalar neon sve2 sme2 sve2p1 sme2p1 sve_intrinsics
 sme_intrinsics` ...).
diff --git a/content/learning-paths/cross-platform/simd-loops/3-example.md b/content/learning-paths/cross-platform/simd-loops/3-example.md
@@ -1,5 +1,5 @@
 ---
-title: Example
+title: Code example
 weight: 5
 
 ### FIXED, DO NOT MODIFY
@@ -263,7 +263,7 @@ Guide](https://developer.arm.com/documentation/109246/latest/).
 Beyond the SME2 and SVE2 implementations shown above, this loop also includes several
 alternative optimized versions, each leveraging architecture-specific features.
 
-### Neon
+### NEON
 
 The neon version (lines 612-710) relies on multiple structure load/store
 combined with indexed `fmla` instructions to vectorize the matrix multiplication
diff --git a/content/learning-paths/cross-platform/simd-loops/4-conclusion.md b/content/learning-paths/cross-platform/simd-loops/4-conclusion.md
@@ -6,23 +6,21 @@ weight: 6
 layout: learningpathall
 ---
 
-[SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) is an invaluable
+SIMD Loops is an invaluable
 resource for developers looking to learn or master the intricacies of SVE and
 SME on modern Arm architectures. By providing practical, hands-on examples, it
 bridges the gap between the architecture specification and real-world
-application. Whether you're transitioning from Neon or starting fresh with SVE
+application. Whether you're transitioning from NEON or starting fresh with SVE
 and SME, SIMD Loops offers a comprehensive toolkit to enhance your understanding
 and proficiency.
 
 With its extensive collection of loop kernels, detailed documentation, and
-flexible build options, [SIMD
-Loops](https://gitlab.arm.com/architecture/simd-loops) empowers you to explore
+flexible build options, SIMD Loops empowers you to explore
 and leverage the full potential of Arm's advanced vector extensions. Dive into
 the project, experiment with the examples, and take your high-performance coding
 skills for Arm to the next level.
 
-For more information and to get started, visit the [SIMD
-Loops](https://gitlab.arm.com/architecture/simd-loops) GitLab project and refer
+For more information and to get started, visit the GitLab project and refer
 to the
 [README.md](https://gitlab.arm.com/architecture/simd-loops/-/blob/main/README.md)
-for detailed instructions on building and running the code. Happy coding!
+for instructions on building and running the code. 
diff --git a/content/learning-paths/cross-platform/simd-loops/_index.md b/content/learning-paths/cross-platform/simd-loops/_index.md
@@ -1,20 +1,20 @@
 ---
-title: "Code kata: perfect your SVE and SME instructions skills with SIMD Loops"
+title: "Code kata: perfect your SVE and SME skills with SIMD Loops"
 
 minutes_to_complete: 30
 
 draft: true
 cascade:
     draft: true
 
-who_is_this_for: This is an advanced topic for software developers who want to learn how to use the full range of features available in SVE, SVE2 and SME2 to improve the performance of their software for Arm processors.
+who_is_this_for: This is an advanced topic for software developers who want to learn how to use the full range of features available in SVE, SVE2 and SME2 to improve software performance on Arm processors.
 
 learning_objectives:
-     - Improve your writing of SIMD code with SVE and SME.
+     - Improve SIMD code performance using Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME).
 
 prerequisites:
-    - An AArch64 computer running Linux or macOS. You can use cloud instances, see this list of [Arm cloud service providers](/learning-paths/servers-and-cloud-computing/csp/).
-    - Some familiarity of SIMD programming and Neon intrinsics
+    - An AArch64 computer running Linux or macOS. You can use cloud instances, refer to [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/) for a list of cloud service providers. 
+    - Some familiarity with SIMD programming and NEON intrinsics.
 
 author:
     - Alejandro Martinez Vicente
@@ -33,6 +33,13 @@ tools_software_languages:
     - Clang
     - FVP
 
+shared_path: true
+shared_between:
+    - servers-and-cloud-computing
+    - laptops-and-desktops
+    - mobile-graphics-and-gaming
+    - automotive
+
 further_reading:
     - resource:
         title: SVE Programming Examples