@@ -28,20 +28,26 @@ A loop is structured as follows:
2828``` C
2929// Includes and loop_<NNN>_data structure definition
3030
31+ #if defined(HAVE_NATIVE) || defined(HAVE_AUTOVEC)
32+
33+ // C code
34+ void inner_loop_ <NNN >(struct loop_ <NNN >_ data * data) { ... }
35+
3136#if defined(HAVE_xxx_INTRINSICS)
3237
3338// Intrinsics versions: xxx = SME, SVE, or SIMD (Neon) versions
3439void inner_loop_ <NNN >(struct loop_ <NNN >_ data * data) { ... }
3540
36- #elif defined(HAVE_xxx )
41+ #elif defined(<ASM_COND> )
3742
38- // Hand-written inline assembly : xxx = SME2P1, SME2, SVE2P1, SVE2, SVE, or SIMD
43+ // Hand-written inline assembly :
44+ // <ASM_COND> = __ ARM_FEATURE_SME2p1, __ ARM_FEATURE_SME2, __ ARM_FEATURE_SVE2p1,
45+ // __ ARM_FEATURE_SVE2, __ ARM_FEATURE_SVE, or __ ARM_NEON
3946void inner_loop_ <NNN >(struct loop_ <NNN >_ data * data) { ... }
4047
4148#else
4249
43- // Equivalent C code
44- void inner_loop_ <NNN >(struct loop_ <NNN >_ data * data) { ... }
50+ #error "No implementations available for this target."
4551
4652#endif
4753
@@ -50,14 +56,15 @@ void inner_loop_<NNN>(struct loop_<NNN>_data *data) { ... }
5056
5157Each loop is implemented in several SIMD extension variants, and conditional
5258compilation is used to select one of the optimisations for the
53- ` inner_loop_<NNN> ` function. When ACLE is supported (e.g. SME, SVE, or
54- SIMD/Neon), a high-level intrinsic implementation is compiled. If ACLE is not
55- available, the tool falls back to handwritten inline assembly targeting one of
56- the various SIMD extensions, including SME2.1, SME2, SVE2.1, SVE2, and others.
57- If no handwritten inline assembly is detected, a fallback implementation in
58- native C is used. The overall code structure also includes setup and cleanup
59- code in the main function, where memory buffers are allocated, the selected loop
60- kernel is executed, and results are verified for correctness.
59+ ` inner_loop_<NNN> ` function. The native C implementation is written first, and
60+ it can be generated either when building natively (HAVE_NATIVE) or through
61+ compiler auto-vectorization (HAVE_AUTOVEC). When SIMD ACLE is supported (e.g.,
62+ SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE
63+ support is not available, the build process falls back to handwritten inline
64+ assembly targeting one of the available SIMD extensions, such as SME2.1, SME2,
65+ SVE2.1, SVE2, and others. The overall code structure also includes setup and
66+ cleanup code in the main function, where memory buffers are allocated, the
67+ selected loop kernel is executed, and results are verified for correctness.
6168
6269At compile time, you can select which loop optimisation to compile, whether it
6370is based on SME or SVE intrinsics, or one of the available inline assembly
0 commit comments