Skip to content

Commit 777686d

Browse files
committed
Add support for FEAT_SVE2p2/FEAT_SME2p2 intrinsics
These instructions are available under features FEAT_SVE2p2 or FEAT_SME2p2. COMPACT: Copy Active vector elements to lower-numbered elements (Byte/Halfword variants) EXPAND: Copy lower-numbered vector elements to Active elements (Byte/Halfword/Word/Doubleword variants) FIRSTP: Scalar index of first true predicate element (predicated) (Byte/Halfword/Word/Doubleword variants) LASTP: Scalar index of last true predicate element (predicated) (Byte/Halfword/Word/Doubleword variants) FMUL (multiple and single vector): Multi-vector floating-point multiply by vector FMUL (multiple vectors): Multi-vector floating-point multiply
1 parent 1db9c69 commit 777686d

File tree

1 file changed

+87
-0
lines changed

1 file changed

+87
-0
lines changed

main/acle.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,10 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
465465

466466
* Added feature test macro for FEAT_SSVE_FEXPA.
467467
* Added feature test macro for FEAT_CSSC.
468+
* Added [**Alpha**](#current-status-and-anticipated-changes)
469+
support for SVE2.2 (FEAT_SVE2p2)
470+
* Added [**Alpha**](#current-status-and-anticipated-changes)
471+
support for SME2.2 (FEAT_SME2p2).
468472

469473
### References
470474

@@ -1978,6 +1982,10 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero.
19781982
are available and if the associated [ACLE features]
19791983
(#sme-language-extensions-and-intrinsics) are supported.
19801984

1985+
`__ARM_FEATURE_SVE2p2` is defined to 1 if the FEAT_SVE2p2 instructions
1986+
are available and if the associated [ACLE features]
1987+
(#sme-language-extensions-and-intrinsics) are supported.
1988+
19811989
#### NEON-SVE Bridge macro
19821990

19831991
`__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [`<arm_neon_sve_bridge.h>`](#arm_neon_sve_bridge.h)
@@ -2000,6 +2008,7 @@ of SME has an associated preprocessor macro, given in the table below:
20002008
| FEAT_SME | __ARM_FEATURE_SME |
20012009
| FEAT_SME2 | __ARM_FEATURE_SME2 |
20022010
| FEAT_SME2p1 | __ARM_FEATURE_SME2p1 |
2011+
| FEAT_SME2p2 | __ARM_FEATURE_SME2p2 |
20032012

20042013
Each macro is defined if there is hardware support for the associated
20052014
architecture feature and if all of the [ACLE
@@ -2649,6 +2658,7 @@ be found in [[BA]](#BA).
26492658
| [`__ARM_FEATURE_SVE2_SM3`](#sm3-extension) | SVE2 support for the SM3 cryptographic extension (FEAT_SVE_SM3) | 1 |
26502659
| [`__ARM_FEATURE_SVE2_SM4`](#sm4-extension) | SVE2 support for the SM4 cryptographic extension (FEAT_SVE_SM4) | 1 |
26512660
| [`__ARM_FEATURE_SVE2p1`](#sve2) | SVE version 2.1 (FEAT_SVE2p1)
2661+
| [`__ARM_FEATURE_SVE2p2`](#sve2) | SVE version 2.2 (FEAT_SVE2p2)
26522662
| [`__ARM_FEATURE_SYSREG128`](#bit-system-registers) | Support for 128-bit system registers (FEAT_SYSREG128) | 1 |
26532663
| [`__ARM_FEATURE_UNALIGNED`](#unaligned-access-supported-in-hardware) | Hardware support for unaligned access | 1 |
26542664
| [`__ARM_FP`](#hardware-floating-point) | Hardware floating-point | 1 |
@@ -12877,6 +12887,33 @@ Zero ZA vector groups
1287712887
__arm_streaming __arm_inout("za");
1287812888
```
1287912889

12890+
### SME2.2 instruction intrinsics
12891+
12892+
The intrinsics in this section are defined by the header file
12893+
[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2p2` is defined.
12894+
12895+
#### FMUL
12896+
12897+
Multi-vector floating-point multiply
12898+
12899+
``` c
12900+
// Variants are also available for:
12901+
// [_single_f32_x2]
12902+
// [_single_f64_x2]
12903+
// [_single_f16_x4]
12904+
// [_single_f32_x4]
12905+
// [_single_f64_x4]
12906+
svfloat16x2_t svmul[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;
12907+
12908+
// Variants are also available for:
12909+
// [_f32_x2]
12910+
// [_f64_x2]
12911+
// [_f16_x4]
12912+
// [_f32_x4]
12913+
// [_f64_x4]
12914+
svfloat16x2_t svmul[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming;
12915+
```
12916+
1288012917
### Streaming-compatible versions of standard routines
1288112918

1288212919
ACLE provides the following streaming-compatible functions,
@@ -13426,6 +13463,56 @@ While (resulting in predicate tuple)
1342613463
svboolx2_t svwhilelt_b8[_s64]_x2(int64_t rn, int64_t rm);
1342713464
```
1342813465

13466+
### SVE2.2 and SME2.2 instruction intrinsics
13467+
13468+
The functions in this section are defined by either the header file
13469+
[`<arm_sve.h>`](#arm_sve.h) or [`<arm_sme.h>`](#arm_sme.h)
13470+
when `__ARM_FEATURE_SVE2p2` or `__ARM_FEATURE_SME2p2` is defined, respectively.
13471+
13472+
#### COMPACT, EXPAND
13473+
13474+
Copy active vector elements to/from lower-numbered elements.
13475+
13476+
These intrinsics can be called from streaming code only if the
13477+
`__ARM_FEATURE_SME2p2` feature macro is defined.
13478+
13479+
They can be called from non-streaming code if the `__ARM_FEATURE_SVE2p2` feature
13480+
macro is defined or both the `__ARM_FEATURE_SVE` and `__ARM_FEATURE_SME2p2`
13481+
feature macros are defined.
13482+
13483+
``` c
13484+
// Variants are available for:
13485+
// _s8, _s16, _u16, _mf8, _bf16, _f16
13486+
svuint8_t svcompact[_u8](svbool_t pg, svuint8_t zn);
13487+
13488+
// Variants are available for:
13489+
// _s8, _s16, _u16, _s32, _u32, _s64, _u64
13490+
// _mf8, _bf16, _f16, _f32, _f64
13491+
svuint8_t svexpand[_u8](svbool_t pg, svuint8_t zn);
13492+
13493+
```
13494+
13495+
#### FIRSTP, LASTP
13496+
13497+
Scalar index of first/last true predicate element (predicated).
13498+
13499+
These intrinsics can be called from streaming mode if either of the feature
13500+
macros `__ARM_FEATURE_SVE` or `__ARM_FEATURE_SME` are defined.
13501+
13502+
They can be called from non-streaming code only if the `__ARM_FEATURE_SVE`
13503+
feature macro is defined.
13504+
13505+
``` c
13506+
// Variants are available for:
13507+
// _b16, _b32, _b64
13508+
int64_t svfirstp_b8(svbool_t pg, svbool_t op);
13509+
13510+
// Variants are available for:
13511+
// _b16, _b32, _b64
13512+
int64_t svlastp_b8(svbool_t pg, svbool_t op);
13513+
13514+
```
13515+
1342913516

1343013517
### SME2 maximum and minimum absolute value
1343113518

0 commit comments

Comments
 (0)