Skip to content

Commit 7de9269

Browse files
committed
Add support for FEAT_SVE2p2/FEAT_SME2p2 intrinsics
These instructions are available under features FEAT_SVE2p2 or FEAT_SME2p2. COMPACT: Copy Active vector elements to lower-numbered elements (Byte/Halfword variants) EXPAND: Copy lower-numbered vector elements to Active elements (Byte/Halfword/Word/Doubleword variants) FIRSTP: Scalar index of first true predicate element (predicated) (Byte/Halfword/Word/Doubleword variants) LASTP: Scalar index of last true predicate element (predicated) (Byte/Halfword/Word/Doubleword variants) FMUL (multiple and single vector): Multi-vector floating-point multiply by vector FMUL (multiple vectors): Multi-vector floating-point multiply
1 parent 83378b8 commit 7de9269

File tree

1 file changed

+87
-0
lines changed

1 file changed

+87
-0
lines changed

main/acle.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -470,6 +470,10 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
470470
* Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics.
471471
* Added support for Brain 16-bit floating-point vector multiplication intrinsics.
472472
* Added support for FEAT_SVE_AES2, FEAT_SSVE_AES intrinsics.
473+
* Added [**Alpha**](#current-status-and-anticipated-changes)
474+
support for SVE2.2 (FEAT_SVE2p2)
475+
* Added [**Alpha**](#current-status-and-anticipated-changes)
476+
support for SME2.2 (FEAT_SME2p2).
473477

474478
### References
475479

@@ -1983,6 +1987,10 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero.
19831987
are available and if the associated [ACLE features]
19841988
(#sme-language-extensions-and-intrinsics) are supported.
19851989

1990+
`__ARM_FEATURE_SVE2p2` is defined to 1 if the FEAT_SVE2p2 instructions
1991+
are available and if the associated [ACLE features]
1992+
(#sme-language-extensions-and-intrinsics) are supported.
1993+
19861994
#### NEON-SVE Bridge macro
19871995

19881996
`__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [`<arm_neon_sve_bridge.h>`](#arm_neon_sve_bridge.h)
@@ -2005,6 +2013,7 @@ of SME has an associated preprocessor macro, given in the table below:
20052013
| FEAT_SME | __ARM_FEATURE_SME |
20062014
| FEAT_SME2 | __ARM_FEATURE_SME2 |
20072015
| FEAT_SME2p1 | __ARM_FEATURE_SME2p1 |
2016+
| FEAT_SME2p2 | __ARM_FEATURE_SME2p2 |
20082017

20092018
Each macro is defined if there is hardware support for the associated
20102019
architecture feature and if all of the [ACLE
@@ -2707,6 +2716,7 @@ be found in [[BA]](#BA).
27072716
| [`__ARM_FEATURE_SVE2_SM3`](#sm3-extension) | SVE2 support for the SM3 cryptographic extension (FEAT_SVE_SM3) | 1 |
27082717
| [`__ARM_FEATURE_SVE2_SM4`](#sm4-extension) | SVE2 support for the SM4 cryptographic extension (FEAT_SVE_SM4) | 1 |
27092718
| [`__ARM_FEATURE_SVE2p1`](#sve2) | SVE version 2.1 (FEAT_SVE2p1)
2719+
| [`__ARM_FEATURE_SVE2p2`](#sve2) | SVE version 2.2 (FEAT_SVE2p2)
27102720
| [`__ARM_FEATURE_SYSREG128`](#bit-system-registers) | Support for 128-bit system registers (FEAT_SYSREG128) | 1 |
27112721
| [`__ARM_FEATURE_UNALIGNED`](#unaligned-access-supported-in-hardware) | Hardware support for unaligned access | 1 |
27122722
| [`__ARM_FP`](#hardware-floating-point) | Hardware floating-point | 1 |
@@ -13007,6 +13017,33 @@ Zero ZA vector groups
1300713017
__arm_streaming __arm_inout("za");
1300813018
```
1300913019

13020+
### SME2.2 instruction intrinsics
13021+
13022+
The intrinsics in this section are defined by the header file
13023+
[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2p2` is defined.
13024+
13025+
#### FMUL
13026+
13027+
Multi-vector floating-point multiply
13028+
13029+
``` c
13030+
// Variants are also available for:
13031+
// [_single_f32_x2]
13032+
// [_single_f64_x2]
13033+
// [_single_f16_x4]
13034+
// [_single_f32_x4]
13035+
// [_single_f64_x4]
13036+
svfloat16x2_t svmul[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;
13037+
13038+
// Variants are also available for:
13039+
// [_f32_x2]
13040+
// [_f64_x2]
13041+
// [_f16_x4]
13042+
// [_f32_x4]
13043+
// [_f64_x4]
13044+
svfloat16x2_t svmul[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming;
13045+
```
13046+
1301013047
### Streaming-compatible versions of standard routines
1301113048

1301213049
ACLE provides the following streaming-compatible functions,
@@ -13556,6 +13593,56 @@ While (resulting in predicate tuple)
1355613593
svboolx2_t svwhilelt_b8[_s64]_x2(int64_t rn, int64_t rm);
1355713594
```
1355813595

13596+
### SVE2.2 and SME2.2 instruction intrinsics
13597+
13598+
The functions in this section are defined by either the header file
13599+
[`<arm_sve.h>`](#arm_sve.h) or [`<arm_sme.h>`](#arm_sme.h)
13600+
when `__ARM_FEATURE_SVE2p2` or `__ARM_FEATURE_SME2p2` is defined, respectively.
13601+
13602+
#### COMPACT, EXPAND
13603+
13604+
Copy active vector elements to/from lower-numbered elements.
13605+
13606+
These intrinsics can be called from streaming code only if the
13607+
`__ARM_FEATURE_SME2p2` feature macro is defined.
13608+
13609+
They can be called from non-streaming code if the `__ARM_FEATURE_SVE2p2` feature
13610+
macro is defined or both the `__ARM_FEATURE_SVE` and `__ARM_FEATURE_SME2p2`
13611+
feature macros are defined.
13612+
13613+
``` c
13614+
// Variants are available for:
13615+
// _s8, _s16, _u16, _mf8, _bf16, _f16
13616+
svuint8_t svcompact[_u8](svbool_t pg, svuint8_t zn);
13617+
13618+
// Variants are available for:
13619+
// _s8, _s16, _u16, _s32, _u32, _s64, _u64
13620+
// _mf8, _bf16, _f16, _f32, _f64
13621+
svuint8_t svexpand[_u8](svbool_t pg, svuint8_t zn);
13622+
13623+
```
13624+
13625+
#### FIRSTP, LASTP
13626+
13627+
Scalar index of first/last true predicate element (predicated).
13628+
13629+
These intrinsics can be called from streaming mode if either of the feature
13630+
macros `__ARM_FEATURE_SVE` or `__ARM_FEATURE_SME` are defined.
13631+
13632+
They can be called from non-streaming code only if the `__ARM_FEATURE_SVE`
13633+
feature macro is defined.
13634+
13635+
``` c
13636+
// Variants are available for:
13637+
// _b16, _b32, _b64
13638+
int64_t svfirstp_b8(svbool_t pg, svbool_t op);
13639+
13640+
// Variants are available for:
13641+
// _b16, _b32, _b64
13642+
int64_t svlastp_b8(svbool_t pg, svbool_t op);
13643+
13644+
```
13645+
1355913646

1356013647
### SME2 maximum and minimum absolute value
1356113648

0 commit comments

Comments
 (0)