Skip to content

Commit 79c5ef2

Browse files
committed
Add support for the Brain 16-bit floating-point vector multiplication intrinsics
Adds intrinsic support for the Brain 16-bit floating-point vector multiplication instructions introduced by the FEAT_SVE_BFSCALE feature in 2024 dpISA. BFSCALE: BFloat16 adjust exponent by vector (predicated) BFSCALE (multiple and single vector): Multi-vector BFloat16 adjust exponent by vector BFSCALE (multiple vectors): Multi-vector BFloat16 adjust exponent BFMUL (multiple and single vector): Multi-vector BFloat16 floating-point multiply by vector BFMUL (multiple vectors): Multi-vector BFloat16 floating-point multiply
1 parent 1db9c69 commit 79c5ef2

File tree

1 file changed

+42
-1
lines changed

1 file changed

+42
-1
lines changed

main/acle.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
465465

466466
* Added feature test macro for FEAT_SSVE_FEXPA.
467467
* Added feature test macro for FEAT_CSSC.
468+
* Added [**Alpha**](#current-status-and-anticipated-changes) support
469+
for Brain 16-bit floating-point vector multiplication intrinsics.
468470

469471
### References
470472

@@ -2122,6 +2124,20 @@ are available. Specifically, if this macro is defined to `1`, then:
21222124
for the FEAT_SME_B16B16 instructions and if their associated intrinsics
21232125
are available.
21242126

2127+
#### Brain 16-bit floating-point vector multiplication support
2128+
2129+
This section is in
2130+
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
2131+
extended in the future.
2132+
2133+
`__ARM_FEATURE_SVE_BFSCALE` is defined to `1` if there is hardware
2134+
support for the SVE BF16 vector multiplication extensions and if the
2135+
associated ACLE intrinsics are available.
2136+
2137+
See [Half-precision brain
2138+
floating-point](#half-precision-brain-floating-point) for details
2139+
of half-precision brain floating-point types.
2140+
21252141
### Cryptographic extensions
21262142

21272143
#### “Crypto” extension
@@ -2634,6 +2650,7 @@ be found in [[BA]](#BA).
26342650
| [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 |
26352651
| [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 |
26362652
| [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 |
2653+
| [`__ARM_FEATURE_SVE_BFSCALE`](#brain-16-bit-floating-point-vector-multiplication-support) | SVE support for the 16-bit brain floating-point vector multiplication extension (FEAT_SVE_BFSCALE) | 1 |
26372654
| [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 |
26382655
| [`__ARM_FEATURE_SVE_MATMUL_FP32`](#multiplication-of-32-bit-floating-point-matrices) | 32-bit floating-point matrix multiply extension (FEAT_F32MM) | 1 |
26392656
| [`__ARM_FEATURE_SVE_MATMUL_FP64`](#multiplication-of-64-bit-floating-point-matrices) | 64-bit floating-point matrix multiply extension (FEAT_F64MM) | 1 |
@@ -11639,7 +11656,7 @@ Multi-vector floating-point fused multiply-add/subtract
1163911656
__arm_streaming __arm_inout("za");
1164011657
```
1164111658

11642-
#### BFMLA. BFMLS, FMLA, FMLS (indexed)
11659+
#### BFMLA, BFMLS, FMLA, FMLS (indexed)
1164311660

1164411661
Multi-vector floating-point fused multiply-add/subtract
1164511662

@@ -12732,6 +12749,30 @@ element types.
1273212749
svint8x4_t svuzpq[_s8_x4](svint8x4_t zn) __arm_streaming;
1273312750
```
1273412751

12752+
#### BFMUL
12753+
12754+
BFloat16 Multi-vector floating-point multiply
12755+
12756+
``` c
12757+
// Only if __ARM_FEATURE_SVE_BFSCALE != 0
12758+
svbfloat16x2_t svmul[_bf16_x2](svbfloat16x2_t zd, svbfloat16x2_t zm) __arm_streaming;
12759+
svbfloat16x2_t svmul[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zm) __arm_streaming;
12760+
svbfloat16x4_t svmul[_bf16_x4](svbfloat16x4_t zd, svbfloat16x4_t zm) __arm_streaming;
12761+
svbfloat16x4_t svmul[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zm) __arm_streaming;
12762+
```
12763+
12764+
#### BFSCALE
12765+
12766+
BFloat16 floating-point adjust exponent vectors.
12767+
12768+
``` c
12769+
// Only if __ARM_FEATURE_SVE_BFSCALE != 0
12770+
svbfloat16x2_t svscale[_bf16_x2](svbfloat16x2_t zdn, svint16x2_t zm);
12771+
svbfloat16x2_t svscale[_single_bf16_x2](svbfloat16x2_t zn, svint16_t zm);
12772+
svbfloat16x4_t svscale[_bf16_x4](svbfloat16x4_t zdn, svint16x4_t zm);
12773+
svbfloat16x4_t svscale[_single_bf16_x4](svbfloat16x4_t zn, svint16_t zm);
12774+
```
12775+
1273512776
### SME2.1 instruction intrinsics
1273612777

1273712778
The specification for SME2.1 is in

0 commit comments

Comments
 (0)