Add intrinsics for the FEAT_SVE_AES2 feature introduced by the 2024 dpISA

amilendra · amilendra · commit a23e8783b68a · 2025-09-15T15:42:47.000+01:00
FEAT_SVE_AES2 adds

1) SVE multi-vector Advanced Encryption Standard (AES) instructions
Instructions added: AESE, AESD, AESEMC and AESDIMC
For each instruction there are two variants
  a) Two registers variant
  b) Four registers variant

2) SVE multi-vector 128-bit polynomial multiply long instructions
Instructions added: PMULL and PMLAL

FEAT_SSVE_AES implements the same instructions but when in streaming mode.
diff --git a/main/acle.md b/main/acle.md
@@ -469,6 +469,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
 * Added support for modal 8-bit floating point matrix multiply-add widening intrinsics.
 * Added support for 16-bit floating point matrix multiply-add widening intrinsics.
 * Added support for Brain 16-bit floating-point vector multiplication intrinsics.
+* Added support for FEAT_SVE_AES2, FEAT_SSVE_AES intrinsics.
 
 ### References
 
@@ -2161,6 +2162,15 @@ support for the SVE2 AES (FEAT_SVE_AES) instructions and if the associated
 ACLE intrinsics are available. This implies that `__ARM_FEATURE_AES`
 and `__ARM_FEATURE_SVE2` are both nonzero.
 
+In addition, `__ARM_FEATURE_SVE2_AES2` is defined to `1` if there is hardware
+support for the SVE2 AES2 (FEAT_SVE_AES2) instructions and if the associated
+ACLE intrinsics are available. This implies that `__ARM_FEATURE_AES`
+and `__ARM_FEATURE_SVE2` are both nonzero.
+
+`__ARM_FEATURE_SSVE_AES2` is defined to 1 if there is hardware support for
+SVE2 AES2 (FEAT_SVE_AES2) instructions in Streaming SVE mode (FEAT_SSVE_AES)
+and if the associated ACLE intrinsics are available.
+
 #### SHA2 extension
 
 `__ARM_FEATURE_SHA2` is defined to 1 if the SHA1 & SHA2-256 Crypto
@@ -2613,6 +2623,7 @@ be found in [[BA]](#BA).
 | [`__ARM_BF16_FORMAT_ALTERNATIVE`](#brain-16-bit-floating-point-support)                                                                                 | 16-bit brain floating-point, alternative format                                                    | 1           |
 | [`__ARM_BIG_ENDIAN`](#endianness)                                                                                                                       | Memory is big-endian                                                                               | 1           |
 | [`__ARM_FEATURE_AES`](#aes-extension)                                                                                                                   | AES Crypto extension (Arm v8-A)                                                                    | 1           |
+| [`__ARM_FEATURE_AES2`](#aes-extension)                                                                                                                  | SVE2 Multi-vector AES Crypto extension (Arm v9.6-A)                                                | 1           |
 | [`__ARM_FEATURE_ATOMICS`](#large-system-extensions)                                                                                                     | Large System Extensions                                                                            | 1           |
 | [`__ARM_FEATURE_BF16`](#brain-16-bit-floating-point-support)                                                                                            | 16-bit brain floating-point, vector instruction                                                    | 1           |
 | [`__ARM_FEATURE_BTI_DEFAULT`](#branch-target-identification)                                                                                            | Branch Target Identification                                                                       | 1           |
@@ -2691,6 +2702,7 @@ be found in [[BA]](#BA).
 | [`__ARM_FEATURE_SVE_VECTOR_OPERATORS`](#scalable-vector-extension-sve)                                                                                  | Level of support for C and C++ operators on SVE predicate types                                     | 1           |
 | [`__ARM_FEATURE_SVE2`](#sve2)                                                                                                                           | SVE version 2 (FEAT_SVE2)                                                                          | 1           |
 | [`__ARM_FEATURE_SVE2_AES`](#aes-extension)                                                                                                              | SVE2 support for the AES cryptographic extension (FEAT_SVE_AES)                                     | 1           |
+| [`__ARM_FEATURE_SVE2_AES2`](#aes-extension)                                                                                                             | SVE2 support for the SVE multi-vector AES cryptographic extension (FEAT_SVE_AES2)                   | 1           |
 | [`__ARM_FEATURE_SVE2_BITPERM`](#bit-permute-extension)                                                                                                  | SVE2 bit permute extension                                                    | 1           |
 | [`__ARM_FEATURE_SSVE_BITPERM`](#bit-permute-extension)                                                                                                  | SVE2 bit permute extension                                                    | 1           |
 | [`__ARM_FEATURE_SSVE_FEXPA`](#streaming-sve-fexpa-extension)                                                                                            | Streaming SVE FEXPA extension                                                 | 1           |
@@ -9454,6 +9466,30 @@ to work with `svboolx2_t` and `svboolx4_t`.  For example:
     svboolx2_t svundef2_b();
 ```
 
+#### AESE, AESD, AESEMC, AESDIMC
+
+Multi-vector Advanced Encryption Standard instructions
+
+svuint8x2_t    svaese[_u8_x2]     (svuint8x2_t op1, svuint64_t op2, uint64_t index);
+svuint8x4_t    svaese[_u8_x4]     (svuint8x4_t op1, svuint64_t op2, uint64_t index);
+svuint8x2_t    svaesd[_u8_x2]     (svuint8x2_t op1, svuint64_t op2, uint64_t index);
+svuint8x4_t    svaesd[_u8_x4]     (svuint8x4_t op1, svuint64_t op2, uint64_t index);
+svuint8x2_t    svaesemc[_u8_x2]   (svuint8x2_t op1, svuint64_t op2, uint64_t index);
+svuint8x4_t    svaesemc[_u8_x4]   (svuint8x4_t op1, svuint64_t op2, uint64_t index);
+svuint8x2_t    svaesdimc[_u8_x2]  (svuint8x2_t op1, svuint64_t op2, uint64_t index);
+svuint8x4_t    svaesdimc[_u8_x4]  (svuint8x4_t op1, svuint64_t op2, uint64_t index);
+
+#### PMULL, PMLAL
+
+Multi-vector 128-bit polynomial multiply long instructions
+
+``` c
+  // Variants are also available for:
+  // _s64x2, _f64x2
+  svuint64x2_t svpmull[_u64x2](svuint64_t zn, svuint64_t zm);
+  svuint64x2_t svpmlal[_u64x2](svuint64_t zn, svuint64_t zm);
+  ```
+
 #### ADDQV, FADDQV
 
 Unsigned/FP add reduction of quadword vector segments.