Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions neon_intrinsics/advsimd.md
Original file line number Diff line number Diff line change
Expand Up @@ -6182,7 +6182,7 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``.

#### Matrix multiply

| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|-------------------------------|-------------------|---------------------------|
| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vmmlaq_f16_mf8" target="_blank">vmmlaq_f16_mf8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; fpm_t fpm)</code> | `r -> Vd.4H`<br>`a -> Vn.16B`<br>`b -> Vm.16B` | `FMMLA Vd.4H, Vn.16B, Vm.16B` | `Vd.4H -> result` | `A64` |
| <code>float32x4_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vmmlaq_f32_mf8" target="_blank">vmmlaq_f32_mf8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x4_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; fpm_t fpm)</code> | `r -> Vd.4S`<br>`a -> Vn.16B`<br>`b -> Vm.16B` | `FMMLA Vd.4S, Vn.16B, Vm.16B` | `Vd.4S -> result` | `A64` |
| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|-------------------------------|-------------------|---------------------------|
| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vmmlaq_f16_mf8_fpm" target="_blank">vmmlaq_f16_mf8_fpm</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; fpm_t fpm)</code> | `r -> Vd.8H`<br>`a -> Vn.16B`<br>`b -> Vm.16B` | `FMMLA Vd.8H, Vn.16B, Vm.16B` | `Vd.8H -> result` | `A64` |
| <code>float32x4_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vmmlaq_f32_mf8_fpm" target="_blank">vmmlaq_f32_mf8_fpm</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float32x4_t r,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t a,<br>&nbsp;&nbsp;&nbsp;&nbsp; mfloat8x16_t b,<br>&nbsp;&nbsp;&nbsp;&nbsp; fpm_t fpm)</code> | `r -> Vd.4S`<br>`a -> Vn.16B`<br>`b -> Vm.16B` | `FMMLA Vd.4S, Vn.16B, Vm.16B` | `Vd.4S -> result` | `A64` |
4 changes: 2 additions & 2 deletions tools/intrinsic_db/advsimd.csv
Original file line number Diff line number Diff line change
Expand Up @@ -4812,5 +4812,5 @@ float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x
float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) vd -> Vd.4S;vm -> Vn.16B; vm -> Vm.B; 0 <= lane <= 15 FMLALLBB Vd.4S, Vn.16B, Vm.B[lane] Vd.4S -> result A64

<SECTION> Matrix multiplication intrinsics from Armv9.6-A
float16x8_t vmmlaq_f16_mf8(float16x8_t r, mfloat8x16_t a, mfloat8x16_t b, fpm_t fpm) r -> Vd.4H;a -> Vn.16B;b -> Vm.16B FMMLA Vd.4H, Vn.16B, Vm.16B Vd.4H -> result A64
float32x4_t vmmlaq_f32_mf8(float32x4_t r, mfloat8x16_t a, mfloat8x16_t b, fpm_t fpm) r -> Vd.4S;a -> Vn.16B;b -> Vm.16B FMMLA Vd.4S, Vn.16B, Vm.16B Vd.4S -> result A64
float16x8_t vmmlaq_f16_mf8_fpm(float16x8_t r, mfloat8x16_t a, mfloat8x16_t b, fpm_t fpm) r -> Vd.8H;a -> Vn.16B;b -> Vm.16B FMMLA Vd.8H, Vn.16B, Vm.16B Vd.8H -> result A64
float32x4_t vmmlaq_f32_mf8_fpm(float32x4_t r, mfloat8x16_t a, mfloat8x16_t b, fpm_t fpm) r -> Vd.4S;a -> Vn.16B;b -> Vm.16B FMMLA Vd.4S, Vn.16B, Vm.16B Vd.4S -> result A64
4 changes: 2 additions & 2 deletions tools/intrinsic_db/advsimd_classification.csv
Original file line number Diff line number Diff line change
Expand Up @@ -4697,5 +4697,5 @@ vmlalltbq_lane_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and wi
vmlalltbq_laneq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen
vmlallttq_lane_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen
vmlallttq_laneq_f32_mf8_fpm Vector arithmetic|Multiply|Multiply-accumulate and widen
vmmlaq_f16_mf8 Vector arithmetic|Matrix multiply
vmmlaq_f32_mf8 Vector arithmetic|Matrix multiply
vmmlaq_f16_mf8_fpm Vector arithmetic|Matrix multiply
vmmlaq_f32_mf8_fpm Vector arithmetic|Matrix multiply