From cd981a4886db7f6e192f553813787e56563785f4 Mon Sep 17 00:00:00 2001 From: Amilendra Kodithuwakku Date: Mon, 15 Sep 2025 16:46:38 +0100 Subject: [PATCH 1/6] Add support for FEAT_SVE2p2/FEAT_SME2p2 intrinsics These instructions are available under features FEAT_SVE2p2 or FEAT_SME2p2. COMPACT: Copy Active vector elements to lower-numbered elements (Byte/Halfword variants) EXPAND: Copy lower-numbered vector elements to Active elements (Byte/Halfword/Word/Doubleword variants) FIRSTP: Scalar index of first true predicate element (predicated) (Byte/Halfword/Word/Doubleword variants) LASTP: Scalar index of last true predicate element (predicated) (Byte/Halfword/Word/Doubleword variants) FMUL (multiple and single vector): Multi-vector floating-point multiply by vector FMUL (multiple vectors): Multi-vector floating-point multiply --- main/acle.md | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/main/acle.md b/main/acle.md index 5b3d2bdd..dc1377f3 100644 --- a/main/acle.md +++ b/main/acle.md @@ -467,6 +467,10 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Added feature test macro for FEAT_CSSC. * Added support for modal 8-bit floating point matrix multiply-accumulate widening intrinsics. * Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics. +* Added [**Alpha**](#current-status-and-anticipated-changes) + support for SVE2.2 (FEAT_SVE2p2) +* Added [**Alpha**](#current-status-and-anticipated-changes) + support for SME2.2 (FEAT_SME2p2). ### References @@ -1980,6 +1984,10 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero. are available and if the associated [ACLE features] (#sme-language-extensions-and-intrinsics) are supported. +`__ARM_FEATURE_SVE2p2` is defined to 1 if the FEAT_SVE2p2 instructions + are available and if the associated [ACLE features] +(#sme-language-extensions-and-intrinsics) are supported. + #### NEON-SVE Bridge macro `__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [``](#arm_neon_sve_bridge.h) @@ -2002,6 +2010,7 @@ of SME has an associated preprocessor macro, given in the table below: | FEAT_SME | __ARM_FEATURE_SME | | FEAT_SME2 | __ARM_FEATURE_SME2 | | FEAT_SME2p1 | __ARM_FEATURE_SME2p1 | +| FEAT_SME2p2 | __ARM_FEATURE_SME2p2 | Each macro is defined if there is hardware support for the associated architecture feature and if all of the [ACLE @@ -2674,6 +2683,7 @@ be found in [[BA]](#BA). | [`__ARM_FEATURE_SVE2_SM3`](#sm3-extension) | SVE2 support for the SM3 cryptographic extension (FEAT_SVE_SM3) | 1 | | [`__ARM_FEATURE_SVE2_SM4`](#sm4-extension) | SVE2 support for the SM4 cryptographic extension (FEAT_SVE_SM4) | 1 | | [`__ARM_FEATURE_SVE2p1`](#sve2) | SVE version 2.1 (FEAT_SVE2p1) +| [`__ARM_FEATURE_SVE2p2`](#sve2) | SVE version 2.2 (FEAT_SVE2p2) | [`__ARM_FEATURE_SYSREG128`](#bit-system-registers) | Support for 128-bit system registers (FEAT_SYSREG128) | 1 | | [`__ARM_FEATURE_UNALIGNED`](#unaligned-access-supported-in-hardware) | Hardware support for unaligned access | 1 | | [`__ARM_FP`](#hardware-floating-point) | Hardware floating-point | 1 | @@ -12927,6 +12937,33 @@ Zero ZA vector groups __arm_streaming __arm_inout("za"); ``` +### SME2.2 instruction intrinsics + +The intrinsics in this section are defined by the header file +[``](#arm_sme.h) when `__ARM_FEATURE_SME2p2` is defined. + +#### FMUL + +Multi-vector floating-point multiply + +``` c + // Variants are also available for: + // [_single_f32_x2] + // [_single_f64_x2] + // [_single_f16_x4] + // [_single_f32_x4] + // [_single_f64_x4] + svfloat16x2_t svmul[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming; + + // Variants are also available for: + // [_f32_x2] + // [_f64_x2] + // [_f16_x4] + // [_f32_x4] + // [_f64_x4] + svfloat16x2_t svmul[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming; +``` + ### Streaming-compatible versions of standard routines ACLE provides the following streaming-compatible functions, @@ -13476,6 +13513,56 @@ While (resulting in predicate tuple) svboolx2_t svwhilelt_b8[_s64]_x2(int64_t rn, int64_t rm); ``` +### SVE2.2 and SME2.2 instruction intrinsics + +The functions in this section are defined by either the header file + [``](#arm_sve.h) or [``](#arm_sme.h) +when `__ARM_FEATURE_SVE2p2` or `__ARM_FEATURE_SME2p2` is defined, respectively. + +#### COMPACT, EXPAND + +Copy active vector elements to/from lower-numbered elements. + +These intrinsics can be called from streaming code only if the +`__ARM_FEATURE_SME2p2` feature macro is defined. + +They can be called from non-streaming code if the `__ARM_FEATURE_SVE2p2` feature +macro is defined or both the `__ARM_FEATURE_SVE` and `__ARM_FEATURE_SME2p2` +feature macros are defined. + +``` c + // Variants are available for: + // _s8, _s16, _u16, _mf8, _bf16, _f16 + svuint8_t svcompact[_u8](svbool_t pg, svuint8_t zn); + + // Variants are available for: + // _s8, _s16, _u16, _s32, _u32, _s64, _u64 + // _mf8, _bf16, _f16, _f32, _f64 + svuint8_t svexpand[_u8](svbool_t pg, svuint8_t zn); + + ``` + +#### FIRSTP, LASTP + +Scalar index of first/last true predicate element (predicated). + +These intrinsics can be called from streaming mode if either of the feature +macros `__ARM_FEATURE_SVE` or `__ARM_FEATURE_SME` are defined. + +They can be called from non-streaming code only if the `__ARM_FEATURE_SVE` +feature macro is defined. + +``` c + // Variants are available for: + // _b16, _b32, _b64 + int64_t svfirstp_b8(svbool_t pg, svbool_t op); + + // Variants are available for: + // _b16, _b32, _b64 + int64_t svlastp_b8(svbool_t pg, svbool_t op); + + ``` + ### SME2 maximum and minimum absolute value From 78c7dd0a23311a27d4cdbb1be4f8b3b4e0400f3d Mon Sep 17 00:00:00 2001 From: Amilendra Kodithuwakku Date: Wed, 8 Oct 2025 10:24:24 +0100 Subject: [PATCH 2/6] Add SME2.2/SVE2.2 support status (Alpha) to the content --- main/acle.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/main/acle.md b/main/acle.md index dc1377f3..0a808c8a 100644 --- a/main/acle.md +++ b/main/acle.md @@ -12939,6 +12939,10 @@ Zero ZA vector groups ### SME2.2 instruction intrinsics +The specification for SME2.2 are in +[**Alpha** state](#current-status-and-anticipated-changes) and might change or be +extended in the future. + The intrinsics in this section are defined by the header file [``](#arm_sme.h) when `__ARM_FEATURE_SME2p2` is defined. @@ -13515,6 +13519,10 @@ While (resulting in predicate tuple) ### SVE2.2 and SME2.2 instruction intrinsics +The specification for SVE2.2 and SME2.2 are in +[**Alpha** state](#current-status-and-anticipated-changes) and might change or be +extended in the future. + The functions in this section are defined by either the header file [``](#arm_sve.h) or [``](#arm_sme.h) when `__ARM_FEATURE_SVE2p2` or `__ARM_FEATURE_SME2p2` is defined, respectively. From 09763ccc1b7eb029bdd78caf5da7e0a9a2ec8c1d Mon Sep 17 00:00:00 2001 From: Amilendra Kodithuwakku Date: Fri, 10 Oct 2025 14:26:53 +0100 Subject: [PATCH 3/6] SME/SVE2.2 Remove conditions for being called by streaming/non-streaming code --- main/acle.md | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/main/acle.md b/main/acle.md index 0a808c8a..8e4992d1 100644 --- a/main/acle.md +++ b/main/acle.md @@ -13531,13 +13531,6 @@ when `__ARM_FEATURE_SVE2p2` or `__ARM_FEATURE_SME2p2` is defined, respectively. Copy active vector elements to/from lower-numbered elements. -These intrinsics can be called from streaming code only if the -`__ARM_FEATURE_SME2p2` feature macro is defined. - -They can be called from non-streaming code if the `__ARM_FEATURE_SVE2p2` feature -macro is defined or both the `__ARM_FEATURE_SVE` and `__ARM_FEATURE_SME2p2` -feature macros are defined. - ``` c // Variants are available for: // _s8, _s16, _u16, _mf8, _bf16, _f16 @@ -13554,12 +13547,6 @@ feature macros are defined. Scalar index of first/last true predicate element (predicated). -These intrinsics can be called from streaming mode if either of the feature -macros `__ARM_FEATURE_SVE` or `__ARM_FEATURE_SME` are defined. - -They can be called from non-streaming code only if the `__ARM_FEATURE_SVE` -feature macro is defined. - ``` c // Variants are available for: // _b16, _b32, _b64 @@ -13571,7 +13558,6 @@ feature macro is defined. ``` - ### SME2 maximum and minimum absolute value The intrinsics in this section are defined by the header file From 29f8554b2a82e6548d5a12250a6099cf350b59c3 Mon Sep 17 00:00:00 2001 From: Amilendra Kodithuwakku Date: Fri, 24 Oct 2025 12:57:02 +0100 Subject: [PATCH 4/6] Address review comments 1. Change firstp/lastp parameter names to match the register names: op -> pn 2. split the x2 and x4 variants of the FMUL intrinsics --- main/acle.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/main/acle.md b/main/acle.md index 8e4992d1..a9c28e32 100644 --- a/main/acle.md +++ b/main/acle.md @@ -12954,18 +12954,22 @@ Multi-vector floating-point multiply // Variants are also available for: // [_single_f32_x2] // [_single_f64_x2] - // [_single_f16_x4] + svfloat16x2_t svmul[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming; + + // Variants are also available for: // [_single_f32_x4] // [_single_f64_x4] - svfloat16x2_t svmul[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming; + svfloat16x4_t svmul[_single_f16_x4](svfloat16x4_t zd, svfloat16_t zm) __arm_streaming; // Variants are also available for: // [_f32_x2] // [_f64_x2] - // [_f16_x4] + svfloat16x2_t svmul[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming; + + // Variants are also available for: // [_f32_x4] // [_f64_x4] - svfloat16x2_t svmul[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming; + svfloat16x4_t svmul[_f16_x4](svfloat16x4_t zd, svfloat16x4_t zm) __arm_streaming; ``` ### Streaming-compatible versions of standard routines @@ -13550,11 +13554,11 @@ Scalar index of first/last true predicate element (predicated). ``` c // Variants are available for: // _b16, _b32, _b64 - int64_t svfirstp_b8(svbool_t pg, svbool_t op); + int64_t svfirstp_b8(svbool_t pg, svbool_t pn); // Variants are available for: // _b16, _b32, _b64 - int64_t svlastp_b8(svbool_t pg, svbool_t op); + int64_t svlastp_b8(svbool_t pg, svbool_t pn); ``` From fd2e7124fef4a8d4a689a327a96af617216e6688 Mon Sep 17 00:00:00 2001 From: Marian Lukac Date: Wed, 19 Nov 2025 12:04:39 +0000 Subject: [PATCH 5/6] add frint intrinsics --- main/acle.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/main/acle.md b/main/acle.md index a9c28e32..1a5fabf3 100644 --- a/main/acle.md +++ b/main/acle.md @@ -13531,6 +13531,43 @@ The functions in this section are defined by either the header file [``](#arm_sve.h) or [``](#arm_sme.h) when `__ARM_FEATURE_SVE2p2` or `__ARM_FEATURE_SME2p2` is defined, respectively. +#### FRINT32X, FRINT32Z, FRINT64X, FRINT64Z + +Round to integral floating-point values. + +```c + +//Variant is available for _f64 +svfloat32_t frint32x[_f32]_z(svbool_t pg, svfloat32_t zn); +//Variant is available for _f64 +svfloat32_t frint32x[_f32]_x(svbool_t pg, svfloat32_t zn); +//Variant is available for _f64 +svfloat32_t frint32x[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn); + +//Variant is available for _f64 +svfloat32_t frint32z[_f32]_z(svbool_t pg, svfloat32_t zn); +//Variant is available for _f64 +svfloat32_t frint32z[_f32]_x(svbool_t pg, svfloat32_t zn); +//Variant is available for _f64 +svfloat32_t frint32z[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn); + +//Variant is available for _f64 +svfloat32_t frint64x[_f32]_z(svbool_t pg, svfloat32_t zn); +//Variant is available for _f64 +svfloat32_t frint64x[_f32]_x(svbool_t pg, svfloat32_t zn); +//Variant is available for _f64 +svfloat32_t frint64x[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn); + +//Variant is available for _f64 +svfloat32_t frint64z[_f32]_z(svbool_t pg, svfloat32_t zn); +//Variant is available for _f64 +svfloat32_t frint64z[_f32]_x(svbool_t pg, svfloat32_t zn); +//Variant is available for _f64 +svfloat32_t frint64z[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn); + + +``` + #### COMPACT, EXPAND Copy active vector elements to/from lower-numbered elements. From 52d1f8d8fa2b073e34ef7a3b46c56f4f97d3d0b2 Mon Sep 17 00:00:00 2001 From: Marian Lukac Date: Wed, 19 Nov 2025 12:05:32 +0000 Subject: [PATCH 6/6] remove empty lines --- main/acle.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/main/acle.md b/main/acle.md index 1a5fabf3..4b033092 100644 --- a/main/acle.md +++ b/main/acle.md @@ -13564,8 +13564,6 @@ svfloat32_t frint64z[_f32]_z(svbool_t pg, svfloat32_t zn); svfloat32_t frint64z[_f32]_x(svbool_t pg, svfloat32_t zn); //Variant is available for _f64 svfloat32_t frint64z[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn); - - ``` #### COMPACT, EXPAND