Skip to content

Commit 1075dcb

Browse files
committed
Add support for the Brain 16-bit floating-point vector multiplication intrinsics
Adds intrinsic support for the Brain 16-bit floating-point vector multiplication instructions introduced by the FEAT_SVE_BFSCALE feature in 2024 dpISA. BFSCALE: BFloat16 adjust exponent by vector (predicated) BFSCALE (multiple and single vector): Multi-vector BFloat16 adjust exponent by vector BFSCALE (multiple vectors): Multi-vector BFloat16 adjust exponent BFMUL (multiple and single vector): Multi-vector BFloat16 floating-point multiply by vector BFMUL (multiple vectors): Multi-vector BFloat16 floating-point multiply
1 parent 4303ef0 commit 1075dcb

File tree

1 file changed

+64
-1
lines changed

1 file changed

+64
-1
lines changed

main/acle.md

Lines changed: 64 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -468,6 +468,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
468468
* Added support for FEAT_FPRCVT intrinsics and `__ARM_FEATURE_FPRCVT`.
469469
* Added support for modal 8-bit floating point matrix multiply-accumulate widening intrinsics.
470470
* Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics.
471+
* Added support for Brain 16-bit floating-point vector multiplication intrinsics.
471472

472473
### References
473474

@@ -2003,6 +2004,7 @@ of SME has an associated preprocessor macro, given in the table below:
20032004
| FEAT_SME | __ARM_FEATURE_SME |
20042005
| FEAT_SME2 | __ARM_FEATURE_SME2 |
20052006
| FEAT_SME2p1 | __ARM_FEATURE_SME2p1 |
2007+
| FEAT_SME2p2 | __ARM_FEATURE_SME2p2 |
20062008

20072009
Each macro is defined if there is hardware support for the associated
20082010
architecture feature and if all of the [ACLE
@@ -2125,6 +2127,16 @@ are available. Specifically, if this macro is defined to `1`, then:
21252127
for the FEAT_SME_B16B16 instructions and if their associated intrinsics
21262128
are available.
21272129

2130+
#### Brain 16-bit floating-point vector multiplication support
2131+
2132+
`__ARM_FEATURE_SVE_BFSCALE` is defined to `1` if there is hardware
2133+
support for the SVE BF16 vector multiplication extensions and if the
2134+
associated ACLE intrinsics are available.
2135+
2136+
See [Half-precision brain
2137+
floating-point](#half-precision-brain-floating-point) for details
2138+
of half-precision brain floating-point types.
2139+
21282140
### Cryptographic extensions
21292141

21302142
#### “Crypto” extension
@@ -2665,6 +2677,7 @@ be found in [[BA]](#BA).
26652677
| [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 |
26662678
| [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 |
26672679
| [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 |
2680+
| [`__ARM_FEATURE_SVE_BFSCALE`](#brain-16-bit-floating-point-vector-multiplication-support) | SVE support for the 16-bit brain floating-point vector multiplication extension (FEAT_SVE_BFSCALE) | 1 |
26682681
| [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 |
26692682
| [`__ARM_FEATURE_SVE_MATMUL_FP32`](#multiplication-of-32-bit-floating-point-matrices) | 32-bit floating-point matrix multiply extension (FEAT_F32MM) | 1 |
26702683
| [`__ARM_FEATURE_SVE_MATMUL_FP64`](#multiplication-of-64-bit-floating-point-matrices) | 64-bit floating-point matrix multiply extension (FEAT_F64MM) | 1 |
@@ -11698,7 +11711,7 @@ Multi-vector floating-point fused multiply-add/subtract
1169811711
__arm_streaming __arm_inout("za");
1169911712
```
1170011713

11701-
#### BFMLA. BFMLS, FMLA, FMLS (indexed)
11714+
#### BFMLA, BFMLS, FMLA, FMLS (indexed)
1170211715

1170311716
Multi-vector floating-point fused multiply-add/subtract
1170411717

@@ -12791,6 +12804,29 @@ element types.
1279112804
svint8x4_t svuzpq[_s8_x4](svint8x4_t zn) __arm_streaming;
1279212805
```
1279312806

12807+
#### BFMUL
12808+
12809+
BFloat16 Multi-vector floating-point multiply
12810+
12811+
``` c
12812+
// Only if __ARM_FEATURE_SVE_BFSCALE != 0
12813+
svbfloat16x2_t svmul[_bf16_x2](svbfloat16x2_t zd, svbfloat16x2_t zm) __arm_streaming;
12814+
svbfloat16x2_t svmul[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zm) __arm_streaming;
12815+
svbfloat16x4_t svmul[_bf16_x4](svbfloat16x4_t zd, svbfloat16x4_t zm) __arm_streaming;
12816+
svbfloat16x4_t svmul[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zm) __arm_streaming;
12817+
```
12818+
12819+
#### BFSCALE
12820+
BFloat16 floating-point adjust exponent vectors.
12821+
12822+
``` c
12823+
// Only if __ARM_FEATURE_SVE_BFSCALE != 0
12824+
svbfloat16x2_t svscale[_bf16_x2](svbfloat16x2_t zdn, svint16x2_t zm);
12825+
svbfloat16x2_t svscale[_single_bf16_x2](svbfloat16x2_t zn, svint16_t zm);
12826+
svbfloat16x4_t svscale[_bf16_x4](svbfloat16x4_t zdn, svint16x4_t zm);
12827+
svbfloat16x4_t svscale[_single_bf16_x4](svbfloat16x4_t zn, svint16_t zm);
12828+
```
12829+
1279412830
### SME2.1 instruction intrinsics
1279512831

1279612832
The specification for SME2.1 is in
@@ -12936,6 +12972,33 @@ Zero ZA vector groups
1293612972
__arm_streaming __arm_inout("za");
1293712973
```
1293812974

12975+
### SME2.2 instruction intrinsics
12976+
12977+
The intrinsics in this section are defined by the header file
12978+
[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2p2` is defined.
12979+
12980+
#### FMUL
12981+
12982+
Multi-vector floating-point multiply
12983+
12984+
``` c
12985+
// Variants are also available for:
12986+
// [_single_f32_x2]
12987+
// [_single_f64_x2]
12988+
// [_single_f16_x4]
12989+
// [_single_f32_x4]
12990+
// [_single_f64_x4]
12991+
svfloat16x2_t svmul[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;
12992+
12993+
// Variants are also available for:
12994+
// [_f32_x2]
12995+
// [_f64_x2]
12996+
// [_f16_x4]
12997+
// [_f32_x4]
12998+
// [_f64_x4]
12999+
svfloat16x2_t svmul[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming;
13000+
```
13001+
1293913002
### Streaming-compatible versions of standard routines
1294013003

1294113004
ACLE provides the following streaming-compatible functions,

0 commit comments

Comments
 (0)