From 79c5ef26d6d3774e906f62c6c8b9b3e151ea0992 Mon Sep 17 00:00:00 2001 From: Amilendra Kodithuwakku Date: Fri, 5 Sep 2025 13:20:22 +0100 Subject: [PATCH 1/2] Add support for the Brain 16-bit floating-point vector multiplication intrinsics Adds intrinsic support for the Brain 16-bit floating-point vector multiplication instructions introduced by the FEAT_SVE_BFSCALE feature in 2024 dpISA. BFSCALE: BFloat16 adjust exponent by vector (predicated) BFSCALE (multiple and single vector): Multi-vector BFloat16 adjust exponent by vector BFSCALE (multiple vectors): Multi-vector BFloat16 adjust exponent BFMUL (multiple and single vector): Multi-vector BFloat16 floating-point multiply by vector BFMUL (multiple vectors): Multi-vector BFloat16 floating-point multiply --- main/acle.md | 43 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/main/acle.md b/main/acle.md index 3b066e93..858f4455 100644 --- a/main/acle.md +++ b/main/acle.md @@ -465,6 +465,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Added feature test macro for FEAT_SSVE_FEXPA. * Added feature test macro for FEAT_CSSC. +* Added [**Alpha**](#current-status-and-anticipated-changes) support + for Brain 16-bit floating-point vector multiplication intrinsics. ### References @@ -2122,6 +2124,20 @@ are available. Specifically, if this macro is defined to `1`, then: for the FEAT_SME_B16B16 instructions and if their associated intrinsics are available. +#### Brain 16-bit floating-point vector multiplication support + +This section is in +[**Alpha** state](#current-status-and-anticipated-changes) and might change or be +extended in the future. + +`__ARM_FEATURE_SVE_BFSCALE` is defined to `1` if there is hardware +support for the SVE BF16 vector multiplication extensions and if the +associated ACLE intrinsics are available. + +See [Half-precision brain +floating-point](#half-precision-brain-floating-point) for details +of half-precision brain floating-point types. + ### Cryptographic extensions #### “Crypto” extension @@ -2634,6 +2650,7 @@ be found in [[BA]](#BA). | [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 | | [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 | | [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 | +| [`__ARM_FEATURE_SVE_BFSCALE`](#brain-16-bit-floating-point-vector-multiplication-support) | SVE support for the 16-bit brain floating-point vector multiplication extension (FEAT_SVE_BFSCALE) | 1 | | [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 | | [`__ARM_FEATURE_SVE_MATMUL_FP32`](#multiplication-of-32-bit-floating-point-matrices) | 32-bit floating-point matrix multiply extension (FEAT_F32MM) | 1 | | [`__ARM_FEATURE_SVE_MATMUL_FP64`](#multiplication-of-64-bit-floating-point-matrices) | 64-bit floating-point matrix multiply extension (FEAT_F64MM) | 1 | @@ -11639,7 +11656,7 @@ Multi-vector floating-point fused multiply-add/subtract __arm_streaming __arm_inout("za"); ``` -#### BFMLA. BFMLS, FMLA, FMLS (indexed) +#### BFMLA, BFMLS, FMLA, FMLS (indexed) Multi-vector floating-point fused multiply-add/subtract @@ -12732,6 +12749,30 @@ element types. svint8x4_t svuzpq[_s8_x4](svint8x4_t zn) __arm_streaming; ``` +#### BFMUL + +BFloat16 Multi-vector floating-point multiply + +``` c + // Only if __ARM_FEATURE_SVE_BFSCALE != 0 + svbfloat16x2_t svmul[_bf16_x2](svbfloat16x2_t zd, svbfloat16x2_t zm) __arm_streaming; + svbfloat16x2_t svmul[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zm) __arm_streaming; + svbfloat16x4_t svmul[_bf16_x4](svbfloat16x4_t zd, svbfloat16x4_t zm) __arm_streaming; + svbfloat16x4_t svmul[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zm) __arm_streaming; + ``` + +#### BFSCALE + +BFloat16 floating-point adjust exponent vectors. + +``` c + // Only if __ARM_FEATURE_SVE_BFSCALE != 0 + svbfloat16x2_t svscale[_bf16_x2](svbfloat16x2_t zdn, svint16x2_t zm); + svbfloat16x2_t svscale[_single_bf16_x2](svbfloat16x2_t zn, svint16_t zm); + svbfloat16x4_t svscale[_bf16_x4](svbfloat16x4_t zdn, svint16x4_t zm); + svbfloat16x4_t svscale[_single_bf16_x4](svbfloat16x4_t zn, svint16_t zm); + ``` + ### SME2.1 instruction intrinsics The specification for SME2.1 is in From 55e394dec4f86d258cf8db98816a749f4cc7c3ad Mon Sep 17 00:00:00 2001 From: Amilendra Kodithuwakku Date: Thu, 16 Oct 2025 15:47:21 +0100 Subject: [PATCH 2/2] Add intrinsics for the BFSCALE : BFloat16 adjust exponent (predicated) instruction --- main/acle.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/main/acle.md b/main/acle.md index 858f4455..07848c6c 100644 --- a/main/acle.md +++ b/main/acle.md @@ -9391,6 +9391,26 @@ BFloat16 floating-point multiply vectors. uint64_t imm_idx); ``` +### SVE2 BFloat16 floating-point adjust exponent vectors instructions. + +The specification for SVE2 BFloat16 floating-point adjust exponent vectors instructions is in +[**Alpha** state](#current-status-and-anticipated-changes) and might change or be +extended in the future. + +#### BFSCALE + +BFloat16 floating-point adjust exponent vectors. + +``` c + // Only if __ARM_FEATURE_SVE_BFSCALE != 0 + svbfloat16_t svscale[_bf16]_m (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm); + svbfloat16_t svscale[_bf16]_x (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm); + svbfloat16_t svscale[_bf16]_z (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm); + svbfloat16_t svscale[_n_bf16]_m (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm); + svbfloat16_t svscale[_n_bf16]_x (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm); + svbfloat16_t svscale[_n_bf16]_z (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm); + ``` + ### SVE2.1 instruction intrinsics The specification for SVE2.1 is in