ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/automotive/_index.md‎
Lines changed: 10 additions & 6 deletions b/‎content/learning-paths/automotive/_index.md‎
Lines changed: 10 additions & 6 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-loops/1-about.md‎
Lines changed: 12 additions & 47 deletions b/‎content/learning-paths/cross-platform/simd-loops/1-about.md‎
Lines changed: 12 additions & 47 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-loops/2-using.md‎
Lines changed: 105 additions & 35 deletions b/‎content/learning-paths/cross-platform/simd-loops/2-using.md‎
Lines changed: 105 additions & 35 deletions
@@ -12,20 +12,24 @@ title: Automotive
 weight: 4
 subjects_filter:
 - Containers and Virtualization: 3
-- Performance and Architecture: 2
+- Performance and Architecture: 5
 operatingsystems_filter:
 - Baremetal: 1
-- Linux: 4
+- Linux: 7
+- macOS: 1
 - RTOS: 1
 tools_software_languages_filter:
-- Automotive: 1
-- C: 1
+- Arm Development Studio: 1
+- Arm Zena CSS: 1
+- C: 2
+- C++: 1
+- Clang: 2
 - DDS: 1
 - Docker: 2
+- GCC: 2
 - Python: 2
 - Raspberry Pi: 1
-- ROS 2: 1
-- ROS2: 2
+- ROS 2: 3
 - Rust: 1
 - Zenoh: 1
 ---
@@ -1,70 +1,35 @@
 ---
-title: About single instruction, multiple data (SIMD) loops
-weight: 3
+title: About Single Instruction, Multiple Data loops
+weight: 2
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-Writing high-performance software for Arm processors often involves delving into
-SIMD technologies. For many developers, that journey started with NEON, a
-familiar, fixed-width vector extension that has been around for many years. But as
-Arm architectures continue to evolve, so do their SIMD technologies.
+## Introduction to SIMD on Arm and why it matters for performance on Arm CPUs
 
-Enter the world of Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME): two powerful, scalable vector extensions designed for modern
-workloads. Unlike NEON, they are not just wider; they are fundamentally different. These
-extensions introduce new instructions, more flexible programming models, and
-support for concepts like predication, scalable vectors, and streaming modes.
-However, they also come with a learning curve.
+Writing high-performance software on Arm often means using single-instruction, multiple-data (SIMD) technologies. Many developers start with NEON, a familiar fixed-width vector extension. As Arm architectures evolve, so do the SIMD capabilities available to you.
 
-That is where [SIMD Loops](https://gitlab.arm.com/architecture/simd-loops) becomes a valuable resource, enabling you to quickly and effectively learn how to write high-performance SIMD code.
+This Learning Path uses the Scalable Vector Extension (SVE) and the Scalable Matrix Extension (SME) to demonstrate modern SIMD patterns. They are two powerful, scalable vector extensions designed for modern workloads. Unlike NEON, these architecture extensions are not just wider; they are fundamentally different. They introduce predication, vector-length-agnostic (VLA) programming, gather/scatter, streaming modes, and tile-based compute with ZA state. The result is more power and flexibility, but there can be a learning curve to match.
 
-SIMD Loops is designed to help
-you learn how to write SVE and SME code. It is a collection
-of self-contained, real-world loop kernels written in a mix of C, Arm C Language Extensions (ACLE)
-intrinsics, and inline assembly. These kernels target tasks ranging from simple arithmetic
-to matrix multiplication, sorting, and string processing. You can compile them,
-run them, step through them, and use them as a foundation for your own SIMD
-work.
+## What is the SIMD Loops project?
 
-If you are familiar with NEON intrinsics, you can use SIMD Loops to learn and explore SVE and SME.
+The SIMD Loops project offers a hands-on way to climb the learning curve. It is a public codebase of self-contained, real loop kernels written in C, Arm C Language Extensions (ACLE) intrinsics, and selected inline assembly. Kernels span tasks such as matrix multiply, sorting, and string processing. You can build them, run them, step through them, and adapt them for your own SIMD workloads.
 
-## What is SIMD Loops?
+Visit the [SIMD Loops Repo](https://gitlab.arm.com/architecture/simd-loops).
 
-SIMD Loops is an open-source
-project, licensed under BSD 3-Clause, built to help you learn how to write SIMD code for modern Arm
-architectures, specifically using SVE and SME.
-It is designed for programmers who already know
-their way around NEON intrinsics but are now facing the more powerful and
-complex world of SVE and SME.
+This open-source project (BSD-3-Clause) teaches SIMD development on modern Arm CPUs with SVE, SVE2, SME, and SME2. It’s aimed at developers who know NEON intrinsics and want to explore newer extensions. The goal of SIMD Loops is to provide working, readable examples that demonstrate how to use the full range of features available in SVE, SVE2, and SME2. Each example is a self-contained loop kernel - a small piece of code that performs a specific task like matrix multiplication, vector reduction, histogram, or memory copy. These examples show how that task can be implemented across different vector instruction sets.
 
-The goal of SIMD Loops is to provide working, readable examples that demonstrate
-how to use the full range of features available in SVE, SVE2, and SME2. Each
-example is a self-contained loop kernel, a small piece of code that performs
-a specific task like matrix multiplication, vector reduction, histogram, or
-memory copy. These examples show how that task can be implemented across different
-vector instruction sets.
-
-Unlike a cookbook that tries to provide a recipe for every problem, SIMD Loops
-takes the opposite approach. It aims to showcase the architecture rather than
-the problem. The loop kernels are chosen to be realistic and meaningful, but the
-main goal is to demonstrate how specific features and instructions work in
-practice. If you are trying to understand scalability, predication,
-gather/scatter, streaming mode, ZA storage, compact instructions, or the
-mechanics of matrix tiles, this is where you will see them in action.
+Unlike a cookbook that attempts to provide a recipe for every problem, SIMD Loops takes the opposite approach. It aims to showcase the architecture rather than the problem itself. The loop kernels are chosen to be realistic and meaningful, but the main goal is to demonstrate how specific features and instructions work in practice. If you are trying to understand scalability, predication, gather/scatter, streaming mode, ZA storage, compact instructions, or the mechanics of matrix tiles, this is where you can see them in action.
 
 The project includes:
-- Dozens of numbered loop kernels, each focused on a specific feature or pattern
+- Many numbered loop kernels, each focused on a specific feature or pattern
 - Reference C implementations to establish expected behavior
 - Inline assembly and/or intrinsics for scalar, NEON, SVE, SVE2, SVE2.1, SME2, and SME2.1
 - Build support for different instruction sets, with runtime validation
 - A simple command-line runner to execute any loop interactively
 - Optional standalone binaries for bare-metal and simulator use
 
-You do not need to worry about auto-vectorization, compiler flags, or tooling
-quirks. Each loop is hand-written and annotated to make the use of SIMD features
-clear. The intent is that you can study, modify, and run each loop as a learning
-exercise, and use the project as a foundation for your own exploration of
-Arm’s vector extensions.
+You do not need to rely on auto-vectorization or guess at compiler flags. Each loop is handwritten and annotated to make the intended use of SIMD features clear. Study a kernel, modify it, rebuild, and observe the effect - this is the core learning loop.
 
 
@@ -1,45 +1,74 @@
 ---
 title: Using SIMD Loops
-weight: 4
+weight: 3
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-To get started, clone the SIMD Loops project and change current directory:
+## Set up your development environment
+
+To get started, clone the SIMD Loops project and change to the project directory:
 
 ```bash
 git clone https://gitlab.arm.com/architecture/simd-loops simd-loops.git
 cd simd-loops.git
 ```
 
+Confirm that you are using an Arm machine:
+
+```bash
+uname -m
+```
+
+Expected output on Linux:
+
+```output
+aarch64
+```
+
+Expected output on macOS:
+
+```output
+arm64
+```
+
 ## SIMD Loops structure
 
-In the SIMD Loops project, the
-source code for the loops is organized under the loops directory. The complete
-list of loops is documented in the loops.inc file, which includes a brief
-description and the purpose of each loop. Every loop is associated with a
-uniquely named source file following the naming pattern `loop_<NNN>.c`, where
-`<NNN>`  represents the loop number.
+In the SIMD Loops project, the source code for the loops is organized under the `loops` directory. The complete list of loops is documented in the `loops.inc` file, which includes a brief description and the purpose of each loop. Every loop is associated with a uniquely named source file following the pattern `loop_<NNN>.c`, where `<NNN>` represents the loop number.
+
+A subset of the `loops.inc` file is below:
+
+```output
+LOOP(001, "FP32 inner product",                "Use of fp32 MLA instruction", STREAMING_COMPATIBLE)
+LOOP(002, "UINT32 inner product",              "Use of u32 MLA instruction", STREAMING_COMPATIBLE)
+LOOP(003, "FP64 inner product",                "Use of fp64 MLA instruction", STREAMING_COMPATIBLE)
+LOOP(004, "UINT64 inner product",              "Use of u64 MLA instruction", STREAMING_COMPATIBLE)
+LOOP(005, "strlen short strings",              "Use of FF and NF loads instructions")
+LOOP(006, "strlen long strings",               "Use of FF and NF loads instructions")
+LOOP(008, "Precise fp64 add reduction",        "Use of FADDA instructions")
+LOOP(009, "Pointer chasing",                   "Use of CTERM and BRK instructions")
+LOOP(010, "Conditional reduction (fp)",        "Use of CLAST (SIMD&FP scalar) instructions", STREAMING_COMPATIBLE)
+```
 
 A loop is structured as follows:
 
-```C
+```c
 // Includes and loop_<NNN>_data structure definition
 
 #if defined(HAVE_NATIVE) || defined(HAVE_AUTOVEC)
 
-// C code
+// C reference or auto-vectorized version
 void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
 
 #if defined(HAVE_xxx_INTRINSICS)
 
-// Intrinsics versions: xxx = SME, SVE, or SIMD (NEON) versions
+// Intrinsics versions: xxx = SME, SVE, or SIMD (NEON)
 void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
 
 #elif defined(<ASM_COND>)
 
- // Hand-written inline assembly :
+// Hand-written inline assembly
 // <ASM_COND> = __ARM_FEATURE_SME2p1, __ARM_FEATURE_SME2, __ARM_FEATURE_SVE2p1,
 //              __ARM_FEATURE_SVE2, __ARM_FEATURE_SVE, or __ARM_NEON
 void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
@@ -50,28 +79,69 @@ void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
 
 #endif
 
-// Main of loop: Buffers allocations, loop function call, result functional checking
+// Main of loop: buffer allocation, loop function call, result checking
+```
+
+Each loop is implemented in several SIMD extension variants. Conditional compilation selects one of the implementations for the `inner_loop_<NNN>` function.
+
+The native C implementation is written first, and it can be generated either when building natively with `-DHAVE_NATIVE` or through compiler auto-vectorization with `-DHAVE_AUTOVEC`.
+
+When SIMD ACLE is supported (SME, SVE, or NEON), the code is compiled using high-level intrinsics. If ACLE support is not available, the build process falls back to handwritten inline assembly targeting one of the available SIMD extensions, such as SME2.1, SME2, SVE2.1, SVE2, and others.
+
+The overall code structure also includes setup and cleanup code in the main function, where memory buffers are allocated, the selected loop kernel is executed, and results are verified for correctness.
+
+At compile time, you can select which loop optimization to compile, whether it is based on SME or SVE intrinsics, or one of the available inline assembly variants.
+
+```console
+make
+```
+
+With no target specified, the list of targets is printed:
+
+```output
+all fmt clean c-scalar scalar autovec-sve autovec-sve2 neon sve sve2 sme2 sme-ssve sve2p1 sme2p1 sve-intrinsics sme-intrinsics
+```
+
+Build all loops for all targets:
+
+```console
+make all
+```
+
+Build all loops for a single target, such as NEON:
+
+```console
+make neon
+```
+
+As a result of the build, two types of binaries are generated.
+
+The first is a single executable named `simd_loops`, which includes all loop implementations.
+
+Select a specific loop by passing parameters to the program. For example, to run loop 1 for 5 iterations using the NEON target:
+
+```console
+build/neon/bin/simd_loops -k 1 -n 5
+```
+
+Example output:
+
+```output
+Loop 001 - FP32 inner product
+ - Purpose: Use of fp32 MLA instruction
+ - Checksum correct.
 ```
 
-Each loop is implemented in several SIMD extension variants, and conditional
-compilation is used to select one of the optimizations for the
-`inner_loop_<NNN>` function. The native C implementation is written first, and
-it can be generated either when building natively (HAVE_NATIVE) or through
-compiler auto-vectorization (HAVE_AUTOVEC). When SIMD ACLE is supported (e.g.,
-SME, SVE, or NEON), the code is compiled using high-level intrinsics. If ACLE
-support is not available, the build process falls back to handwritten inline
-assembly targeting one of the available SIMD extensions, such as SME2.1, SME2,
-SVE2.1, SVE2, and others. The overall code structure also includes setup and
-cleanup code in the main function, where memory buffers are allocated, the
-selected loop kernel is executed, and results are verified for correctness.
-
-At compile time, you can select which loop optimization to compile, whether it
-is based on SME or SVE intrinsics, or one of the available inline assembly
-variants (`make scalar neon sve2 sme2 sve2p1 sme2p1 sve_intrinsics
-sme_intrinsics` ...).
-
-As the result of the build, two types of binaries are generated. The first is a
-single executable named `simd_loops`, which includes all the loop
-implementations. A specific loop can be selected by passing parameters to the
-program (e.g., `simd_loops -k <NNN> -n <iterations>`). The second type consists
-of individual standalone binaries, each corresponding to a specific loop.
+The second type of binary is an individual loop.
+
+To run loop 1 as a standalone binary:
+
+```console
+build/neon/standalone/bin/loop_001.elf
+```
+
+Example output:
+
+```output
+ - Checksum correct.
+```