Skip to content

Commit e938350

Browse files
Intrinsics for absolute minimum and maximum, and table lookup (ARM-software#324)
* Intrinsics for absolute minimum and maximum, and table lookup
1 parent ddfc048 commit e938350

File tree

5 files changed

+363
-9
lines changed

5 files changed

+363
-9
lines changed

main/acle.md

Lines changed: 147 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -405,6 +405,11 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
405405
* Added [**Alpha**](#current-status-and-anticipated-changes)
406406
support for SME2.1 (FEAT_SME2p1).
407407

408+
* Added specifications for floating-point absolute minimum
409+
and maximum intrinsics (FEAT_FAMINMAX).
410+
411+
* Added specifications for table lookup intrinsics (FEAT_LUT, FEAT_SME_LUTv2).
412+
408413
### References
409414

410415
This document refers to the following documents.
@@ -2124,6 +2129,22 @@ support for the SVE2 SM4 (FEAT_SVE_SM4) instructions and if the associated
21242129
ACLE intrinsics are available. This implies that `__ARM_FEATURE_SM4` and
21252130
`__ARM_FEATURE_SVE2` are both nonzero.
21262131

2132+
### Floating-point absolute minimum and maximum extension
2133+
2134+
`__ARM_FEATURE_FAMINMAX` is defined to 1 if there is hardware support for
2135+
floating-point absolute minimum and maximum instructions (FEAT_FAMINMAX)
2136+
and if the associated ACLE intrinsics are available.
2137+
2138+
### Lookup table extensions
2139+
2140+
`__ARM_FEATURE_LUT` is defined to 1 if there is hardware support for
2141+
lookup table instructions with 2-bit and 4-bit indices (FEAT_LUT)
2142+
and if the associated ACLE intrinsics are available.
2143+
2144+
`__ARM_FEATURE_SME_LUTv2` is defined to 1 if there is hardware support for
2145+
lookup table instructions with 4-bit indices and 8-bit elements (FEAT_SME_LUTv2)
2146+
and if the associated ACLE intrinsics are available.
2147+
21272148
### Other floating-point and vector extensions
21282149

21292150
#### Fused multiply-accumulate (FMA)
@@ -2411,12 +2432,14 @@ be found in [[BA]](#BA).
24112432
| [`__ARM_FEATURE_DIRECTED_ROUNDING`](#directed-rounding) | Directed Rounding | 1 |
24122433
| [`__ARM_FEATURE_DOTPROD`](#availability-of-dot-product-intrinsics) | Dot product extension (ARM v8.2-A) | 1 |
24132434
| [`__ARM_FEATURE_DSP`](#dsp-instructions) | DSP instructions (Arm v5E) (32-bit-only) | 1 |
2435+
| [`__ARM_FEATURE_FAMINMAX`](#floating-point-absolute-minimum-and-maximum-extension) | Floating-point absolute minimum and maximum extension | 1 |
24142436
| [`__ARM_FEATURE_FMA`](#fused-multiply-accumulate-fma) | Floating-point fused multiply-accumulate | 1 |
24152437
| [`__ARM_FEATURE_FP16_FML`](#fp16-fml-extension) | FP16 FML extension (Arm v8.4-A, optional Armv8.2-A, Armv8.3-A) | 1 |
24162438
| [`__ARM_FEATURE_FRINT`](#availability-of-armv8.5-a-floating-point-rounding-intrinsics) | Floating-point rounding extension (Arm v8.5-A) | 1 |
24172439
| [`__ARM_FEATURE_IDIV`](#hardware-integer-divide) | Hardware Integer Divide | 1 |
24182440
| [`__ARM_FEATURE_JCVT`](#javascript-floating-point-conversion) | Javascript conversion (ARMv8.3-A) | 1 |
24192441
| [`__ARM_FEATURE_LDREX`](#ldrexstrex) *(Deprecated)* | Load/store exclusive instructions | 0x0F |
2442+
| [`__ARM_FEATURE_LUT`](#lookup-table-extensions) | Lookup table extensions (FEAT_LUT) | 1 |
24202443
| [`__ARM_FEATURE_MATMUL_INT8`](#availability-of-armv8.6-a-integer-matrix-multiply-intrinsics) | Integer Matrix Multiply extension (Armv8.6-A, optional Armv8.2-A, Armv8.3-A, Armv8.4-A, Armv8.5-A) | 1 |
24212444
| [`__ARM_FEATURE_MEMORY_TAGGING`](#memory-tagging) | Memory Tagging (Armv8.5-A) | 1 |
24222445
| [`__ARM_FEATURE_MOPS`](#memcpy-family-of-memory-operations-standarization-instructions---mops) | `memcpy`, `memset`, and `memmove` family of operations standardization instructions | 1 |
@@ -2443,6 +2466,7 @@ be found in [[BA]](#BA).
24432466
| [`__ARM_FEATURE_SME_F64F64`](#double-precision-floating-point-outer-product-intrinsics) | Double precision floating-point outer product intrinsics (FEAT_SME_F64F64) | 1 |
24442467
| [`__ARM_FEATURE_SME_I16I64`](#16-bit-to-64-bit-integer-widening-outer-product-intrinsics) | 16-bit to 64-bit integer widening outer product intrinsics (FEAT_SME_I16I64) | 1 |
24452468
| [`__ARM_FEATURE_SME_LOCALLY_STREAMING`](#scalable-matrix-extension-sme) | Support for the `arm_locally_streaming` attribute | 1 |
2469+
| [`__ARM_FEATURE_SME_LUTv2`](#lookup-table-extensions) | Lookup table extensions (FEAT_SME_LUTv2) | 1 |
24462470
| [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 |
24472471
| [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 |
24482472
| [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 |
@@ -9125,6 +9149,73 @@ Interleave elements from halves of each pair of quadword vector segments.
91259149
svuint8_t svzipq2[_u8](svuint8_t zn, svuint8_t zm);
91269150
```
91279151

9152+
### SVE2 maximum and minimum absolute value
9153+
9154+
The intrinsics in this section are defined by the header file
9155+
[`<arm_sve.h>`](#arm_sve.h) when either `__ARM_FEATURE_SVE2` or
9156+
`__ARM_FEATURE_SME2` is defined to 1, and `__ARM_FEATURE_FAMINMAX`
9157+
is defined to 1.
9158+
9159+
#### FAMAX
9160+
9161+
Floating-point absolute maximum (predicated).
9162+
``` c
9163+
// Variants are also available for: _f32 and _f64
9164+
svfloat16_t svamax[_f16]_m(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9165+
svfloat16_t svamax[_f16]_x(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9166+
svfloat16_t svamax[_f16]_z(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9167+
9168+
// Variants are also available for: _f32 and _f64
9169+
svfloat16_t svamax[_n_f16]_m(svbool_t pg, svfloat16_t zn, float16_t zm);
9170+
svfloat16_t svamax[_n_f16]_x(svbool_t pg, svfloat16_t zn, float16_t zm);
9171+
svfloat16_t svamax[_n_f16]_z(svbool_t pg, svfloat16_t zn, float16_t zm);
9172+
```
9173+
9174+
#### FAMIN
9175+
9176+
Floating-point absolute minimum (predicated).
9177+
``` c
9178+
// Variants are also available for: _f32 and _f64
9179+
svfloat16_t svamin[_f16]_m(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9180+
svfloat16_t svamin[_f16]_x(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9181+
svfloat16_t svamin[_f16]_z(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
9182+
9183+
// Variants are also available for: _f32 and _f64
9184+
svfloat16_t svamin[_n_f16]_m(svbool_t pg, svfloat16_t zn, float16_t zm);
9185+
svfloat16_t svamin[_n_f16]_x(svbool_t pg, svfloat16_t zn, float16_t zm);
9186+
svfloat16_t svamin[_n_f16]_z(svbool_t pg, svfloat16_t zn, float16_t zm);
9187+
```
9188+
9189+
### SVE2 lookup table
9190+
9191+
The intrinsics in this section are defined by the header file
9192+
[`<arm_sve.h>`](#arm_sve.h) when either `__ARM_FEATURE_SVE2` or
9193+
`__ARM_FEATURE_SME2` is defined to 1, and `__ARM_FEATURE_LUT`
9194+
is defined to 1.
9195+
9196+
#### LUTI2
9197+
9198+
Lookup table read with 2-bit indices.
9199+
```c
9200+
// Variant is also available for: _u8
9201+
svint8_t svluti2_lane[_s8](svint8_t table, svuint8_t indices, uint64_t imm_idx);
9202+
9203+
// Variant are also available for: _u16, _f16 and _bf16
9204+
svint16_t svluti2_lane[_s16]( svint16_t table, svuint8_t indices, uint64_t imm_idx);
9205+
```
9206+
9207+
#### LUTI4
9208+
9209+
Lookup table read with 4-bit indices.
9210+
```c
9211+
// Variant is also available for: _u8
9212+
svint8_t svluti4_lane[_s8](svint8_t table, svuint8_t indices, uint64_t imm_idx);
9213+
9214+
// Variant are also available for: _u16, _f16, _bf16
9215+
svint16_t svluti4_lane[_s16](svint16_t table, svuint8_t indices, uint64_t imm_idx);
9216+
svint16_t svluti4_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
9217+
```
9218+
91289219
# SME language extensions and intrinsics
91299220

91309221
The specification for SME is in
@@ -12714,7 +12805,62 @@ While (resulting in predicate tuple)
1271412805
// _b64[_s64]_x2, _b8[_u64]_x2, _b16[_u64]_x2, _b32[_u64]_x2 and
1271512806
// _b64[_u64]_x2
1271612807
svboolx2_t svwhilelt_b8[_s64]_x2(int64_t rn, int64_t rm);
12717-
```
12808+
```
12809+
12810+
12811+
### SME2 maximum and minimum absolute value
12812+
12813+
The intrinsics in this section are defined by the header file
12814+
[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2` is defined to 1
12815+
and `__ARM_FEATURE_FAMINMAX` is defined to 1.
12816+
12817+
#### FAMAX
12818+
12819+
Absolute maximum.
12820+
``` c
12821+
// Variants are also available for:
12822+
// [_f32_x2], [_f64_x2],
12823+
// [_f16_x4], [_f32_x4] and [_f64_x4]
12824+
svfloat16x2_t svamax[_f16_x2](svfloat16x2 zd, svfloat16x2_t zm) __arm_streaming;
12825+
```
12826+
12827+
#### FAMIN
12828+
12829+
Absolute minimum.
12830+
``` c
12831+
// Variants are also available for:
12832+
// [_f32_x2], [_f64_x2],
12833+
// [_f16_x4], [_f32_x4] and [_f64_x4]
12834+
svfloat16x2_t svamin[_f16_x2](svfloat16x2 zd, svfloat16x2_t zm) __arm_streaming;
12835+
```
12836+
12837+
### SME2 lookup table
12838+
12839+
The intrinsics in this section are defined by the header file
12840+
[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME_LUTv2` is defined to 1.
12841+
12842+
#### MOVT
12843+
12844+
Move vector register to ZT0.
12845+
``` c
12846+
// Variants are also available for:
12847+
// [_s8], [_u16], [_s16], [_u32], [_s32], [_u64], [_s64]
12848+
// [_bf16], [_f16], [_f32], [_f64]
12849+
void svwrite_zt[_u8](uint64_t zt0, svuint8_t zt) __arm_streaming __arm_out("zt0");
12850+
12851+
// Variants are also available for:
12852+
// [_s8], [_u16], [_s16], [_u32], [_s32], [_u64], [_s64]
12853+
// [_bf16], [_f16], [_f32], [_f64]
12854+
void svwrite_lane_zt[_u8](uint64_t zt0, svuint8_t zt, uint64_t idx) __arm_streaming __arm_inout("zt0");
12855+
```
12856+
12857+
#### LUTI4
12858+
12859+
Lookup table read with 4-bit indexes and 8-bit elements.
12860+
``` c
12861+
// Variants are also available for: _u8
12862+
svint8x4_t svluti4_zt_s8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0");
12863+
```
1271812864

1271912865
# M-profile Vector Extension (MVE) intrinsics
1272012866

0 commit comments

Comments
 (0)