Skip to content

Commit d81d700

Browse files
[simd loops] Fix code snippet + wording improvements.
1 parent a79e5da commit d81d700

File tree

2 files changed

+20
-13
lines changed

2 files changed

+20
-13
lines changed

content/learning-paths/cross-platform/simd-loops/1-about.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ mechanics of matrix tiles --- this is where you’ll see them in action.
5959
The project includes:
6060
- Dozens of numbered loop kernels, each focused on a specific feature or pattern
6161
- Reference C implementations to establish expected behavior
62-
- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, and SME2
62+
- Inline assembly and/or intrinsics for scalar, Neon, SVE, SVE2, SVE2.1, SME2 and SME2.1
6363
- Build support for different instruction sets, with runtime validation
6464
- A simple command-line runner to execute any loop interactively
6565
- Optional standalone binaries for bare-metal and simulator use

content/learning-paths/cross-platform/simd-loops/2-using.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -28,20 +28,26 @@ A loop is structured as follows:
2828
```C
2929
// Includes and loop_<NNN>_data structure definition
3030

31+
#if defined(HAVE_NATIVE) || defined(HAVE_AUTOVEC)
32+
33+
// C code
34+
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
35+
3136
#if defined(HAVE_xxx_INTRINSICS)
3237

3338
// Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions
3439
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
3540

36-
#elif defined(HAVE_xxx)
41+
#elif defined(<ASM_COND>)
3742

38-
// Hand-written inline assembly : xxx = SME2P1, SME2, SVE2P1, SVE2, SVE, or SIMD
43+
// Hand-written inline assembly :
44+
// <ASM_COND> = __ARM_FEATURE_SME2p1, __ARM_FEATURE_SME2, __ARM_FEATURE_SVE2p1,
45+
// __ARM_FEATURE_SVE2, __ARM_FEATURE_SVE, or __ARM_NEON
3946
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
4047

4148
#else
4249

43-
// Equivalent C code
44-
void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
50+
#error "No implementations available for this target."
4551

4652
#endif
4753

@@ -50,14 +56,15 @@ void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
5056

5157
Each loop is implemented in several SIMD extension variants, and conditional
5258
compilation is used to select one of the optimisations for the
53-
`inner_loop_<NNN>` function. When ACLE is supported (e.g. SME, SVE, or
54-
SIMD/Neon), a high-level intrinsic implementation is compiled. If ACLE is not
55-
available, the tool falls back to handwritten inline assembly targeting one of
56-
the various SIMD extensions, including SME2.1, SME2, SVE2.1, SVE2, and others.
57-
If no handwritten inline assembly is detected, a fallback implementation in
58-
native C is used. The overall code structure also includes setup and cleanup
59-
code in the main function, where memory buffers are allocated, the selected loop
60-
kernel is executed, and results are verified for correctness.
59+
`inner_loop_<NNN>` function. The native C implementation is written first, and
60+
it can be generated either when building natively (HAVE_NATIVE) or through
61+
compiler auto-vectorization (HAVE_AUTOVEC). When SIMD ACLE is supported (e.g.,
62+
SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE
63+
support is not available, the build process falls back to handwritten inline
64+
assembly targeting one of the available SIMD extensions, such as SME2.1, SME2,
65+
SVE2.1, SVE2, and others. The overall code structure also includes setup and
66+
cleanup code in the main function, where memory buffers are allocated, the
67+
selected loop kernel is executed, and results are verified for correctness.
6168

6269
At compile time, you can select which loop optimisation to compile, whether it
6370
is based on SME or SVE intrinsics, or one of the available inline assembly

0 commit comments

Comments
 (0)