Add intrinsics for the FEAT_SVE_AES2 feature introduced by the 2024 dpISA

amilendra · amilendra · commit e8fb4e7d85d1 · 2025-10-07T14:35:43.000+01:00
FEAT_SVE_AES2 adds

1) SVE multi-vector Advanced Encryption Standard (AES) instructions
Instructions added: AESE, AESD, AESEMC and AESDIMC
For each instruction there are two variants
  a) Two registers variant
  b) Four registers variant

2) SVE multi-vector 128-bit polynomial multiply long instructions
Instructions added: PMULL and PMLAL

FEAT_SSVE_AES implements the same instructions but when in streaming mode.
diff --git a/main/acle.md b/main/acle.md
@@ -465,6 +465,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
 
 * Added feature test macro for FEAT_SSVE_FEXPA.
 * Added feature test macro for FEAT_CSSC.
+* Added support for FEAT_SVE_AES2, FEAT_SSVE_AES intrinsics.
 
 ### References
 
@@ -2147,6 +2148,15 @@ support for the SVE2 AES (FEAT_SVE_AES) instructions and if the associated
 ACLE intrinsics are available. This implies that `__ARM_FEATURE_AES`
 and `__ARM_FEATURE_SVE2` are both nonzero.
 
+In addition, `__ARM_FEATURE_SVE2_AES2` is defined to `1` if there is hardware
+support for the SVE2 AES2 (FEAT_SVE_AES2) instructions and if the associated
+ACLE intrinsics are available. This implies that `__ARM_FEATURE_AES`
+and `__ARM_FEATURE_SVE2` are both nonzero.
+
+`__ARM_FEATURE_SSVE_AES2` is defined to 1 if there is hardware support for
+SVE2 AES2 (FEAT_SVE_AES2) instructions in Streaming SVE mode (FEAT_SSVE_AES)
+and if the associated ACLE intrinsics are available.
+
 #### SHA2 extension
 
 `__ARM_FEATURE_SHA2` is defined to 1 if the SHA1 & SHA2-256 Crypto
@@ -2642,6 +2652,8 @@ be found in [[BA]](#BA).
 | [`__ARM_FEATURE_SVE_VECTOR_OPERATORS`](#scalable-vector-extension-sve)                                                                                  | Level of support for C and C++ operators on SVE predicate types                                     | 1           |
 | [`__ARM_FEATURE_SVE2`](#sve2)                                                                                                                           | SVE version 2 (FEAT_SVE2)                                                                          | 1           |
 | [`__ARM_FEATURE_SVE2_AES`](#aes-extension)                                                                                                              | SVE2 support for the AES cryptographic extension (FEAT_SVE_AES)                                     | 1           |
+| [`__ARM_FEATURE_SVE2_AES2`](#aes-extension)                                                                                                             | SVE2 support for the SVE multi-vector AES cryptographic extension (FEAT_SVE_AES2)                   | 1           |
+| [`__ARM_FEATURE_SSVE_AES2`](#aes-extension)                                                                                                             | SVE2 support for the SVE multi-vector AES cryptographic extension (FEAT_SSVE_AES)                   | 1           |
 | [`__ARM_FEATURE_SVE2_BITPERM`](#bit-permute-extension)                                                                                                  | SVE2 bit permute extension                                                    | 1           |
 | [`__ARM_FEATURE_SSVE_BITPERM`](#bit-permute-extension)                                                                                                  | SVE2 bit permute extension                                                    | 1           |
 | [`__ARM_FEATURE_SSVE_FEXPA`](#streaming-sve-fexpa-extension)                                                                                            | Streaming SVE FEXPA extension                                                 | 1           |
@@ -9405,6 +9417,30 @@ to work with `svboolx2_t` and `svboolx4_t`.  For example:
     svboolx2_t svundef2_b();
 ```
 
+#### AESE, AESD, AESEMC, AESDIMC
+
+Multi-vector Advanced Encryption Standard instructions
+
+svuint8x2_t    svaese[_u8_x2]     (svuint8x2_t op1, svuint64_t op2, uint64_t index);
+svuint8x4_t    svaese[_u8_x4]     (svuint8x4_t op1, svuint64_t op2, uint64_t index);
+svuint8x2_t    svaesd[_u8_x2]     (svuint8x2_t op1, svuint64_t op2, uint64_t index);
+svuint8x4_t    svaesd[_u8_x4]     (svuint8x4_t op1, svuint64_t op2, uint64_t index);
+svuint8x2_t    svaesemc[_u8_x2]   (svuint8x2_t op1, svuint64_t op2, uint64_t index);
+svuint8x4_t    svaesemc[_u8_x4]   (svuint8x4_t op1, svuint64_t op2, uint64_t index);
+svuint8x2_t    svaesdimc[_u8_x2]  (svuint8x2_t op1, svuint64_t op2, uint64_t index);
+svuint8x4_t    svaesdimc[_u8_x4]  (svuint8x4_t op1, svuint64_t op2, uint64_t index);
+
+#### PMULL, PMLAL
+
+Multi-vector 128-bit polynomial multiply long instructions
+
+``` c
+  // Variants are also available for:
+  // _s64x2, _f64x2
+  svuint64x2_t svpmull[_u64x2](svuint64_t zn, svuint64_t zm);
+  svuint64x2_t svpmlal[_u64x2](svuint64_t zn, svuint64_t zm);
+  ```
+
 #### ADDQV, FADDQV
 
 Unsigned/FP add reduction of quadword vector segments.