Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions main/acle.md
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,10 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
* Added feature test macro for FEAT_CSSC.
* Added support for modal 8-bit floating point matrix multiply-accumulate widening intrinsics.
* Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics.
* Added [**Alpha**](#current-status-and-anticipated-changes)
support for SVE2.2 (FEAT_SVE2p2)
* Added [**Alpha**](#current-status-and-anticipated-changes)
support for SME2.2 (FEAT_SME2p2).

### References

Expand Down Expand Up @@ -1980,6 +1984,10 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero.
are available and if the associated [ACLE features]
(#sme-language-extensions-and-intrinsics) are supported.

`__ARM_FEATURE_SVE2p2` is defined to 1 if the FEAT_SVE2p2 instructions
are available and if the associated [ACLE features]
(#sme-language-extensions-and-intrinsics) are supported.

#### NEON-SVE Bridge macro

`__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [`<arm_neon_sve_bridge.h>`](#arm_neon_sve_bridge.h)
Expand All @@ -2002,6 +2010,7 @@ of SME has an associated preprocessor macro, given in the table below:
| FEAT_SME | __ARM_FEATURE_SME |
| FEAT_SME2 | __ARM_FEATURE_SME2 |
| FEAT_SME2p1 | __ARM_FEATURE_SME2p1 |
| FEAT_SME2p2 | __ARM_FEATURE_SME2p2 |

Each macro is defined if there is hardware support for the associated
architecture feature and if all of the [ACLE
Expand Down Expand Up @@ -2674,6 +2683,7 @@ be found in [[BA]](#BA).
| [`__ARM_FEATURE_SVE2_SM3`](#sm3-extension) | SVE2 support for the SM3 cryptographic extension (FEAT_SVE_SM3) | 1 |
| [`__ARM_FEATURE_SVE2_SM4`](#sm4-extension) | SVE2 support for the SM4 cryptographic extension (FEAT_SVE_SM4) | 1 |
| [`__ARM_FEATURE_SVE2p1`](#sve2) | SVE version 2.1 (FEAT_SVE2p1)
| [`__ARM_FEATURE_SVE2p2`](#sve2) | SVE version 2.2 (FEAT_SVE2p2)
| [`__ARM_FEATURE_SYSREG128`](#bit-system-registers) | Support for 128-bit system registers (FEAT_SYSREG128) | 1 |
| [`__ARM_FEATURE_UNALIGNED`](#unaligned-access-supported-in-hardware) | Hardware support for unaligned access | 1 |
| [`__ARM_FP`](#hardware-floating-point) | Hardware floating-point | 1 |
Expand Down Expand Up @@ -12927,6 +12937,41 @@ Zero ZA vector groups
__arm_streaming __arm_inout("za");
```

### SME2.2 instruction intrinsics

The specification for SME2.2 are in
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
extended in the future.

The intrinsics in this section are defined by the header file
[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2p2` is defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these instructions do not operate on ZA array, I think they should be in arm_sve.h.


#### FMUL

Multi-vector floating-point multiply

``` c
// Variants are also available for:
// [_single_f32_x2]
Copy link
Contributor

@Lukacma Lukacma Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should split this into x2 and x4 variants. I think this might confuse users as to what type to use for arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split them up. Does this look okay?

// [_single_f64_x2]
svfloat16x2_t svmul[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;

// Variants are also available for:
// [_single_f32_x4]
// [_single_f64_x4]
svfloat16x4_t svmul[_single_f16_x4](svfloat16x4_t zd, svfloat16_t zm) __arm_streaming;

// Variants are also available for:
// [_f32_x2]
// [_f64_x2]
svfloat16x2_t svmul[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming;

// Variants are also available for:
// [_f32_x4]
// [_f64_x4]
svfloat16x4_t svmul[_f16_x4](svfloat16x4_t zd, svfloat16x4_t zm) __arm_streaming;
```

### Streaming-compatible versions of standard routines

ACLE provides the following streaming-compatible functions,
Expand Down Expand Up @@ -13476,6 +13521,81 @@ While (resulting in predicate tuple)
svboolx2_t svwhilelt_b8[_s64]_x2(int64_t rn, int64_t rm);
```

### SVE2.2 and SME2.2 instruction intrinsics

The specification for SVE2.2 and SME2.2 are in
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
extended in the future.

The functions in this section are defined by either the header file
[`<arm_sve.h>`](#arm_sve.h) or [`<arm_sme.h>`](#arm_sme.h)
when `__ARM_FEATURE_SVE2p2` or `__ARM_FEATURE_SME2p2` is defined, respectively.

#### FRINT32X, FRINT32Z, FRINT64X, FRINT64Z

Round to integral floating-point values.

```c

//Variant is available for _f64
svfloat32_t frint32x[_f32]_z(svbool_t pg, svfloat32_t zn);
//Variant is available for _f64
svfloat32_t frint32x[_f32]_x(svbool_t pg, svfloat32_t zn);
//Variant is available for _f64
svfloat32_t frint32x[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn);

//Variant is available for _f64
svfloat32_t frint32z[_f32]_z(svbool_t pg, svfloat32_t zn);
//Variant is available for _f64
svfloat32_t frint32z[_f32]_x(svbool_t pg, svfloat32_t zn);
//Variant is available for _f64
svfloat32_t frint32z[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn);

//Variant is available for _f64
svfloat32_t frint64x[_f32]_z(svbool_t pg, svfloat32_t zn);
//Variant is available for _f64
svfloat32_t frint64x[_f32]_x(svbool_t pg, svfloat32_t zn);
//Variant is available for _f64
svfloat32_t frint64x[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn);

//Variant is available for _f64
svfloat32_t frint64z[_f32]_z(svbool_t pg, svfloat32_t zn);
//Variant is available for _f64
svfloat32_t frint64z[_f32]_x(svbool_t pg, svfloat32_t zn);
//Variant is available for _f64
svfloat32_t frint64z[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn);
```

#### COMPACT, EXPAND

Copy active vector elements to/from lower-numbered elements.

``` c
// Variants are available for:
// _s8, _s16, _u16, _mf8, _bf16, _f16
svuint8_t svcompact[_u8](svbool_t pg, svuint8_t zn);

// Variants are available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64
// _mf8, _bf16, _f16, _f32, _f64
svuint8_t svexpand[_u8](svbool_t pg, svuint8_t zn);

```

#### FIRSTP, LASTP

Scalar index of first/last true predicate element (predicated).

``` c
// Variants are available for:
// _b16, _b32, _b64
int64_t svfirstp_b8(svbool_t pg, svbool_t pn);

// Variants are available for:
// _b16, _b32, _b64
int64_t svlastp_b8(svbool_t pg, svbool_t pn);

```

### SME2 maximum and minimum absolute value

Expand Down