Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 62 additions & 1 deletion main/acle.md
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin

* Added feature test macro for FEAT_SSVE_FEXPA.
* Added feature test macro for FEAT_CSSC.
* Added [**Alpha**](#current-status-and-anticipated-changes) support
for Brain 16-bit floating-point vector multiplication intrinsics.

### References

Expand Down Expand Up @@ -2122,6 +2124,20 @@ are available. Specifically, if this macro is defined to `1`, then:
for the FEAT_SME_B16B16 instructions and if their associated intrinsics
are available.

#### Brain 16-bit floating-point vector multiplication support

This section is in
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
extended in the future.

`__ARM_FEATURE_SVE_BFSCALE` is defined to `1` if there is hardware
support for the SVE BF16 vector multiplication extensions and if the
associated ACLE intrinsics are available.

See [Half-precision brain
floating-point](#half-precision-brain-floating-point) for details
of half-precision brain floating-point types.

### Cryptographic extensions

#### “Crypto” extension
Expand Down Expand Up @@ -2634,6 +2650,7 @@ be found in [[BA]](#BA).
| [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 |
| [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 |
| [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 |
| [`__ARM_FEATURE_SVE_BFSCALE`](#brain-16-bit-floating-point-vector-multiplication-support) | SVE support for the 16-bit brain floating-point vector multiplication extension (FEAT_SVE_BFSCALE) | 1 |
| [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 |
| [`__ARM_FEATURE_SVE_MATMUL_FP32`](#multiplication-of-32-bit-floating-point-matrices) | 32-bit floating-point matrix multiply extension (FEAT_F32MM) | 1 |
| [`__ARM_FEATURE_SVE_MATMUL_FP64`](#multiplication-of-64-bit-floating-point-matrices) | 64-bit floating-point matrix multiply extension (FEAT_F64MM) | 1 |
Expand Down Expand Up @@ -9374,6 +9391,26 @@ BFloat16 floating-point multiply vectors.
uint64_t imm_idx);
```

### SVE2 BFloat16 floating-point adjust exponent vectors instructions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to SVE instead of SVE2?


The specification for SVE2 BFloat16 floating-point adjust exponent vectors instructions is in
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
extended in the future.

#### BFSCALE

BFloat16 floating-point adjust exponent vectors.

``` c
// Only if __ARM_FEATURE_SVE_BFSCALE != 0
svbfloat16_t svscale[_bf16]_m (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last argument to all these functions should be a svint16_t since the corresponding instruction scales by an integer, not a float

svbfloat16_t svscale[_bf16]_x (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm);
svbfloat16_t svscale[_bf16]_z (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm);
svbfloat16_t svscale[_n_bf16]_m (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm);
svbfloat16_t svscale[_n_bf16]_x (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm);
svbfloat16_t svscale[_n_bf16]_z (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm);
```

### SVE2.1 instruction intrinsics

The specification for SVE2.1 is in
Expand Down Expand Up @@ -11639,7 +11676,7 @@ Multi-vector floating-point fused multiply-add/subtract
__arm_streaming __arm_inout("za");
```

#### BFMLA. BFMLS, FMLA, FMLS (indexed)
#### BFMLA, BFMLS, FMLA, FMLS (indexed)

Multi-vector floating-point fused multiply-add/subtract

Expand Down Expand Up @@ -12732,6 +12769,30 @@ element types.
svint8x4_t svuzpq[_s8_x4](svint8x4_t zn) __arm_streaming;
```

#### BFMUL

BFloat16 Multi-vector floating-point multiply

``` c
// Only if __ARM_FEATURE_SVE_BFSCALE != 0
svbfloat16x2_t svmul[_bf16_x2](svbfloat16x2_t zd, svbfloat16x2_t zm) __arm_streaming;
svbfloat16x2_t svmul[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zm) __arm_streaming;
svbfloat16x4_t svmul[_bf16_x4](svbfloat16x4_t zd, svbfloat16x4_t zm) __arm_streaming;
svbfloat16x4_t svmul[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zm) __arm_streaming;
```

#### BFSCALE

BFloat16 floating-point adjust exponent vectors.

``` c
// Only if __ARM_FEATURE_SVE_BFSCALE != 0
svbfloat16x2_t svscale[_bf16_x2](svbfloat16x2_t zdn, svint16x2_t zm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these sme instructions? Should it also be followed by: __arm_streaming;

svbfloat16x2_t svscale[_single_bf16_x2](svbfloat16x2_t zn, svint16_t zm);
svbfloat16x4_t svscale[_bf16_x4](svbfloat16x4_t zdn, svint16x4_t zm);
svbfloat16x4_t svscale[_single_bf16_x4](svbfloat16x4_t zn, svint16_t zm);
```

### SME2.1 instruction intrinsics

The specification for SME2.1 is in
Expand Down