@@ -405,6 +405,11 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
405405* Added [**Alpha**](#current-status-and-anticipated-changes)
406406 support for SME2.1 (FEAT_SME2p1).
407407
408+ * Added specifications for floating-point absolute minimum
409+ and maximum intrinsics (FEAT_FAMINMAX).
410+
411+ * Added specifications for table lookup intrinsics (FEAT_LUT, FEAT_SME_LUTv2).
412+
408413### References
409414
410415This document refers to the following documents.
@@ -2124,6 +2129,22 @@ support for the SVE2 SM4 (FEAT_SVE_SM4) instructions and if the associated
21242129ACLE intrinsics are available. This implies that `__ARM_FEATURE_SM4` and
21252130`__ARM_FEATURE_SVE2` are both nonzero.
21262131
2132+ ### Floating-point absolute minimum and maximum extension
2133+
2134+ `__ARM_FEATURE_FAMINMAX` is defined to 1 if there is hardware support for
2135+ floating-point absolute minimum and maximum instructions (FEAT_FAMINMAX)
2136+ and if the associated ACLE intrinsics are available.
2137+
2138+ ### Lookup table extensions
2139+
2140+ `__ARM_FEATURE_LUT` is defined to 1 if there is hardware support for
2141+ lookup table instructions with 2-bit and 4-bit indices (FEAT_LUT)
2142+ and if the associated ACLE intrinsics are available.
2143+
2144+ `__ARM_FEATURE_SME_LUTv2` is defined to 1 if there is hardware support for
2145+ lookup table instructions with 4-bit indices and 8-bit elements (FEAT_SME_LUTv2)
2146+ and if the associated ACLE intrinsics are available.
2147+
21272148### Other floating-point and vector extensions
21282149
21292150#### Fused multiply-accumulate (FMA)
@@ -2411,12 +2432,14 @@ be found in [[BA]](#BA).
24112432| [`__ARM_FEATURE_DIRECTED_ROUNDING`](#directed-rounding) | Directed Rounding | 1 |
24122433| [`__ARM_FEATURE_DOTPROD`](#availability-of-dot-product-intrinsics) | Dot product extension (ARM v8.2-A) | 1 |
24132434| [`__ARM_FEATURE_DSP`](#dsp-instructions) | DSP instructions (Arm v5E) (32-bit-only) | 1 |
2435+ | [`__ARM_FEATURE_FAMINMAX`](#floating-point-absolute-minimum-and-maximum-extension) | Floating-point absolute minimum and maximum extension | 1 |
24142436| [`__ARM_FEATURE_FMA`](#fused-multiply-accumulate-fma) | Floating-point fused multiply-accumulate | 1 |
24152437| [`__ARM_FEATURE_FP16_FML`](#fp16-fml-extension) | FP16 FML extension (Arm v8.4-A, optional Armv8.2-A, Armv8.3-A) | 1 |
24162438| [`__ARM_FEATURE_FRINT`](#availability-of-armv8.5-a-floating-point-rounding-intrinsics) | Floating-point rounding extension (Arm v8.5-A) | 1 |
24172439| [`__ARM_FEATURE_IDIV`](#hardware-integer-divide) | Hardware Integer Divide | 1 |
24182440| [`__ARM_FEATURE_JCVT`](#javascript-floating-point-conversion) | Javascript conversion (ARMv8.3-A) | 1 |
24192441| [`__ARM_FEATURE_LDREX`](#ldrexstrex) *(Deprecated)* | Load/store exclusive instructions | 0x0F |
2442+ | [`__ARM_FEATURE_LUT`](#lookup-table-extensions) | Lookup table extensions (FEAT_LUT) | 1 |
24202443| [`__ARM_FEATURE_MATMUL_INT8`](#availability-of-armv8.6-a-integer-matrix-multiply-intrinsics) | Integer Matrix Multiply extension (Armv8.6-A, optional Armv8.2-A, Armv8.3-A, Armv8.4-A, Armv8.5-A) | 1 |
24212444| [`__ARM_FEATURE_MEMORY_TAGGING`](#memory-tagging) | Memory Tagging (Armv8.5-A) | 1 |
24222445| [`__ARM_FEATURE_MOPS`](#memcpy-family-of-memory-operations-standarization-instructions---mops) | `memcpy`, `memset`, and `memmove` family of operations standardization instructions | 1 |
@@ -2443,6 +2466,7 @@ be found in [[BA]](#BA).
24432466| [`__ARM_FEATURE_SME_F64F64`](#double-precision-floating-point-outer-product-intrinsics) | Double precision floating-point outer product intrinsics (FEAT_SME_F64F64) | 1 |
24442467| [`__ARM_FEATURE_SME_I16I64`](#16-bit-to-64-bit-integer-widening-outer-product-intrinsics) | 16-bit to 64-bit integer widening outer product intrinsics (FEAT_SME_I16I64) | 1 |
24452468| [`__ARM_FEATURE_SME_LOCALLY_STREAMING`](#scalable-matrix-extension-sme) | Support for the `arm_locally_streaming` attribute | 1 |
2469+ | [`__ARM_FEATURE_SME_LUTv2`](#lookup-table-extensions) | Lookup table extensions (FEAT_SME_LUTv2) | 1 |
24462470| [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 |
24472471| [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 |
24482472| [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 |
@@ -9125,6 +9149,73 @@ Interleave elements from halves of each pair of quadword vector segments.
91259149 svuint8_t svzipq2[_u8](svuint8_t zn, svuint8_t zm);
91269150 ```
91279151
9152+ ### SVE2 maximum and minimum absolute value
9153+
9154+ The intrinsics in this section are defined by the header file
9155+ [`<arm_sve.h>`](#arm_sve.h) when either `__ARM_FEATURE_SVE2` or
9156+ `__ARM_FEATURE_SME2` is defined to 1, and `__ARM_FEATURE_FAMINMAX`
9157+ is defined to 1.
9158+
9159+ #### FAMAX
9160+
9161+ Floating-point absolute maximum (predicated).
9162+ ``` c
9163+ // Variants are also available for: _f32 and _f64
9164+ svfloat16_t svamax[_f16]_m(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9165+ svfloat16_t svamax[_f16]_x(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9166+ svfloat16_t svamax[_f16]_z(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9167+
9168+ // Variants are also available for: _f32 and _f64
9169+ svfloat16_t svamax[_n_f16]_m(svbool_t pg, svfloat16_t zn, float16_t zm);
9170+ svfloat16_t svamax[_n_f16]_x(svbool_t pg, svfloat16_t zn, float16_t zm);
9171+ svfloat16_t svamax[_n_f16]_z(svbool_t pg, svfloat16_t zn, float16_t zm);
9172+ ```
9173+
9174+ #### FAMIN
9175+
9176+ Floating-point absolute minimum (predicated).
9177+ ``` c
9178+ // Variants are also available for: _f32 and _f64
9179+ svfloat16_t svamin[_f16]_m(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9180+ svfloat16_t svamin[_f16]_x(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9181+ svfloat16_t svamin[_f16]_z(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9182+
9183+ // Variants are also available for: _f32 and _f64
9184+ svfloat16_t svamin[_n_f16]_m(svbool_t pg, svfloat16_t zn, float16_t zm);
9185+ svfloat16_t svamin[_n_f16]_x(svbool_t pg, svfloat16_t zn, float16_t zm);
9186+ svfloat16_t svamin[_n_f16]_z(svbool_t pg, svfloat16_t zn, float16_t zm);
9187+ ```
9188+
9189+ ### SVE2 lookup table
9190+
9191+ The intrinsics in this section are defined by the header file
9192+ [`<arm_sve.h>`](#arm_sve.h) when either `__ARM_FEATURE_SVE2` or
9193+ `__ARM_FEATURE_SME2` is defined to 1, and `__ARM_FEATURE_LUT`
9194+ is defined to 1.
9195+
9196+ #### LUTI2
9197+
9198+ Lookup table read with 2-bit indices.
9199+ ```c
9200+ // Variant is also available for: _u8
9201+ svint8_t svluti2_lane[_s8](svint8_t table, svuint8_t indices, uint64_t imm_idx);
9202+
9203+ // Variant are also available for: _u16, _f16 and _bf16
9204+ svint16_t svluti2_lane[_s16]( svint16_t table, svuint8_t indices, uint64_t imm_idx);
9205+ ```
9206+
9207+ #### LUTI4
9208+
9209+ Lookup table read with 4-bit indices.
9210+ ```c
9211+ // Variant is also available for: _u8
9212+ svint8_t svluti4_lane[_s8](svint8_t table, svuint8_t indices, uint64_t imm_idx);
9213+
9214+ // Variant are also available for: _u16, _f16, _bf16
9215+ svint16_t svluti4_lane[_s16](svint16_t table, svuint8_t indices, uint64_t imm_idx);
9216+ svint16_t svluti4_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
9217+ ```
9218+
91289219# SME language extensions and intrinsics
91299220
91309221The specification for SME is in
@@ -12714,7 +12805,62 @@ While (resulting in predicate tuple)
1271412805 // _b64[_s64]_x2, _b8[_u64]_x2, _b16[_u64]_x2, _b32[_u64]_x2 and
1271512806 // _b64[_u64]_x2
1271612807 svboolx2_t svwhilelt_b8[_s64]_x2(int64_t rn, int64_t rm);
12717- ```
12808+ ```
12809+
12810+
12811+ ### SME2 maximum and minimum absolute value
12812+
12813+ The intrinsics in this section are defined by the header file
12814+ [`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2` is defined to 1
12815+ and `__ARM_FEATURE_FAMINMAX` is defined to 1.
12816+
12817+ #### FAMAX
12818+
12819+ Absolute maximum.
12820+ ``` c
12821+ // Variants are also available for:
12822+ // [_f32_x2], [_f64_x2],
12823+ // [_f16_x4], [_f32_x4] and [_f64_x4]
12824+ svfloat16x2_t svamax[_f16_x2](svfloat16x2 zd, svfloat16x2_t zm) __arm_streaming;
12825+ ```
12826+
12827+ #### FAMIN
12828+
12829+ Absolute minimum.
12830+ ``` c
12831+ // Variants are also available for:
12832+ // [_f32_x2], [_f64_x2],
12833+ // [_f16_x4], [_f32_x4] and [_f64_x4]
12834+ svfloat16x2_t svamin[_f16_x2](svfloat16x2 zd, svfloat16x2_t zm) __arm_streaming;
12835+ ```
12836+
12837+ ### SME2 lookup table
12838+
12839+ The intrinsics in this section are defined by the header file
12840+ [`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME_LUTv2` is defined to 1.
12841+
12842+ #### MOVT
12843+
12844+ Move vector register to ZT0.
12845+ ``` c
12846+ // Variants are also available for:
12847+ // [_s8], [_u16], [_s16], [_u32], [_s32], [_u64], [_s64]
12848+ // [_bf16], [_f16], [_f32], [_f64]
12849+ void svwrite_zt[_u8](uint64_t zt0, svuint8_t zt) __arm_streaming __arm_out("zt0");
12850+
12851+ // Variants are also available for:
12852+ // [_s8], [_u16], [_s16], [_u32], [_s32], [_u64], [_s64]
12853+ // [_bf16], [_f16], [_f32], [_f64]
12854+ void svwrite_lane_zt[_u8](uint64_t zt0, svuint8_t zt, uint64_t idx) __arm_streaming __arm_inout("zt0");
12855+ ```
12856+
12857+ #### LUTI4
12858+
12859+ Lookup table read with 4-bit indexes and 8-bit elements.
12860+ ``` c
12861+ // Variants are also available for: _u8
12862+ svint8x4_t svluti4_zt_s8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0");
12863+ ```
1271812864
1271912865# M-profile Vector Extension (MVE) intrinsics
1272012866
0 commit comments