Skip to content

Commit a23e878

Browse files
committed
Add intrinsics for the FEAT_SVE_AES2 feature introduced by the 2024 dpISA
FEAT_SVE_AES2 adds 1) SVE multi-vector Advanced Encryption Standard (AES) instructions Instructions added: AESE, AESD, AESEMC and AESDIMC For each instruction there are two variants a) Two registers variant b) Four registers variant 2) SVE multi-vector 128-bit polynomial multiply long instructions Instructions added: PMULL and PMLAL FEAT_SSVE_AES implements the same instructions but when in streaming mode.
1 parent 1032033 commit a23e878

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

main/acle.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -469,6 +469,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
469469
* Added support for modal 8-bit floating point matrix multiply-add widening intrinsics.
470470
* Added support for 16-bit floating point matrix multiply-add widening intrinsics.
471471
* Added support for Brain 16-bit floating-point vector multiplication intrinsics.
472+
* Added support for FEAT_SVE_AES2, FEAT_SSVE_AES intrinsics.
472473

473474
### References
474475

@@ -2161,6 +2162,15 @@ support for the SVE2 AES (FEAT_SVE_AES) instructions and if the associated
21612162
ACLE intrinsics are available. This implies that `__ARM_FEATURE_AES`
21622163
and `__ARM_FEATURE_SVE2` are both nonzero.
21632164

2165+
In addition, `__ARM_FEATURE_SVE2_AES2` is defined to `1` if there is hardware
2166+
support for the SVE2 AES2 (FEAT_SVE_AES2) instructions and if the associated
2167+
ACLE intrinsics are available. This implies that `__ARM_FEATURE_AES`
2168+
and `__ARM_FEATURE_SVE2` are both nonzero.
2169+
2170+
`__ARM_FEATURE_SSVE_AES2` is defined to 1 if there is hardware support for
2171+
SVE2 AES2 (FEAT_SVE_AES2) instructions in Streaming SVE mode (FEAT_SSVE_AES)
2172+
and if the associated ACLE intrinsics are available.
2173+
21642174
#### SHA2 extension
21652175

21662176
`__ARM_FEATURE_SHA2` is defined to 1 if the SHA1 & SHA2-256 Crypto
@@ -2613,6 +2623,7 @@ be found in [[BA]](#BA).
26132623
| [`__ARM_BF16_FORMAT_ALTERNATIVE`](#brain-16-bit-floating-point-support) | 16-bit brain floating-point, alternative format | 1 |
26142624
| [`__ARM_BIG_ENDIAN`](#endianness) | Memory is big-endian | 1 |
26152625
| [`__ARM_FEATURE_AES`](#aes-extension) | AES Crypto extension (Arm v8-A) | 1 |
2626+
| [`__ARM_FEATURE_AES2`](#aes-extension) | SVE2 Multi-vector AES Crypto extension (Arm v9.6-A) | 1 |
26162627
| [`__ARM_FEATURE_ATOMICS`](#large-system-extensions) | Large System Extensions | 1 |
26172628
| [`__ARM_FEATURE_BF16`](#brain-16-bit-floating-point-support) | 16-bit brain floating-point, vector instruction | 1 |
26182629
| [`__ARM_FEATURE_BTI_DEFAULT`](#branch-target-identification) | Branch Target Identification | 1 |
@@ -2691,6 +2702,7 @@ be found in [[BA]](#BA).
26912702
| [`__ARM_FEATURE_SVE_VECTOR_OPERATORS`](#scalable-vector-extension-sve) | Level of support for C and C++ operators on SVE predicate types | 1 |
26922703
| [`__ARM_FEATURE_SVE2`](#sve2) | SVE version 2 (FEAT_SVE2) | 1 |
26932704
| [`__ARM_FEATURE_SVE2_AES`](#aes-extension) | SVE2 support for the AES cryptographic extension (FEAT_SVE_AES) | 1 |
2705+
| [`__ARM_FEATURE_SVE2_AES2`](#aes-extension) | SVE2 support for the SVE multi-vector AES cryptographic extension (FEAT_SVE_AES2) | 1 |
26942706
| [`__ARM_FEATURE_SVE2_BITPERM`](#bit-permute-extension) | SVE2 bit permute extension | 1 |
26952707
| [`__ARM_FEATURE_SSVE_BITPERM`](#bit-permute-extension) | SVE2 bit permute extension | 1 |
26962708
| [`__ARM_FEATURE_SSVE_FEXPA`](#streaming-sve-fexpa-extension) | Streaming SVE FEXPA extension | 1 |
@@ -9454,6 +9466,30 @@ to work with `svboolx2_t` and `svboolx4_t`. For example:
94549466
svboolx2_t svundef2_b();
94559467
```
94569468

9469+
#### AESE, AESD, AESEMC, AESDIMC
9470+
9471+
Multi-vector Advanced Encryption Standard instructions
9472+
9473+
svuint8x2_t svaese[_u8_x2] (svuint8x2_t op1, svuint64_t op2, uint64_t index);
9474+
svuint8x4_t svaese[_u8_x4] (svuint8x4_t op1, svuint64_t op2, uint64_t index);
9475+
svuint8x2_t svaesd[_u8_x2] (svuint8x2_t op1, svuint64_t op2, uint64_t index);
9476+
svuint8x4_t svaesd[_u8_x4] (svuint8x4_t op1, svuint64_t op2, uint64_t index);
9477+
svuint8x2_t svaesemc[_u8_x2] (svuint8x2_t op1, svuint64_t op2, uint64_t index);
9478+
svuint8x4_t svaesemc[_u8_x4] (svuint8x4_t op1, svuint64_t op2, uint64_t index);
9479+
svuint8x2_t svaesdimc[_u8_x2] (svuint8x2_t op1, svuint64_t op2, uint64_t index);
9480+
svuint8x4_t svaesdimc[_u8_x4] (svuint8x4_t op1, svuint64_t op2, uint64_t index);
9481+
9482+
#### PMULL, PMLAL
9483+
9484+
Multi-vector 128-bit polynomial multiply long instructions
9485+
9486+
``` c
9487+
// Variants are also available for:
9488+
// _s64x2, _f64x2
9489+
svuint64x2_t svpmull[_u64x2](svuint64_t zn, svuint64_t zm);
9490+
svuint64x2_t svpmlal[_u64x2](svuint64_t zn, svuint64_t zm);
9491+
```
9492+
94579493
#### ADDQV, FADDQV
94589494

94599495
Unsigned/FP add reduction of quadword vector segments.

0 commit comments

Comments
 (0)