-
Notifications
You must be signed in to change notification settings - Fork 66
Add support for the Brain 16-bit floating-point vector multiplication (FEAT_SVE_BFSCALE) intrinsics #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add support for the Brain 16-bit floating-point vector multiplication (FEAT_SVE_BFSCALE) intrinsics #410
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -465,6 +465,8 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin | |
|
|
||
| * Added feature test macro for FEAT_SSVE_FEXPA. | ||
| * Added feature test macro for FEAT_CSSC. | ||
| * Added [**Alpha**](#current-status-and-anticipated-changes) support | ||
| for Brain 16-bit floating-point vector multiplication intrinsics. | ||
|
|
||
| ### References | ||
|
|
||
|
|
@@ -2122,6 +2124,20 @@ are available. Specifically, if this macro is defined to `1`, then: | |
| for the FEAT_SME_B16B16 instructions and if their associated intrinsics | ||
| are available. | ||
|
|
||
| #### Brain 16-bit floating-point vector multiplication support | ||
|
|
||
| This section is in | ||
| [**Alpha** state](#current-status-and-anticipated-changes) and might change or be | ||
| extended in the future. | ||
|
|
||
| `__ARM_FEATURE_SVE_BFSCALE` is defined to `1` if there is hardware | ||
| support for the SVE BF16 vector multiplication extensions and if the | ||
| associated ACLE intrinsics are available. | ||
|
|
||
| See [Half-precision brain | ||
| floating-point](#half-precision-brain-floating-point) for details | ||
| of half-precision brain floating-point types. | ||
|
|
||
| ### Cryptographic extensions | ||
|
|
||
| #### “Crypto” extension | ||
|
|
@@ -2634,6 +2650,7 @@ be found in [[BA]](#BA). | |
| | [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 | | ||
| | [`__ARM_FEATURE_SVE_B16B16`](#non-widening-brain-16-bit-floating-point-support) | Non-widening brain 16-bit floating-point intrinsics (FEAT_SVE_B16B16) | 1 | | ||
| | [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 | | ||
| | [`__ARM_FEATURE_SVE_BFSCALE`](#brain-16-bit-floating-point-vector-multiplication-support) | SVE support for the 16-bit brain floating-point vector multiplication extension (FEAT_SVE_BFSCALE) | 1 | | ||
| | [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 | | ||
| | [`__ARM_FEATURE_SVE_MATMUL_FP32`](#multiplication-of-32-bit-floating-point-matrices) | 32-bit floating-point matrix multiply extension (FEAT_F32MM) | 1 | | ||
| | [`__ARM_FEATURE_SVE_MATMUL_FP64`](#multiplication-of-64-bit-floating-point-matrices) | 64-bit floating-point matrix multiply extension (FEAT_F64MM) | 1 | | ||
|
|
@@ -9374,6 +9391,26 @@ BFloat16 floating-point multiply vectors. | |
| uint64_t imm_idx); | ||
| ``` | ||
|
|
||
| ### SVE2 BFloat16 floating-point adjust exponent vectors instructions. | ||
|
|
||
| The specification for SVE2 BFloat16 floating-point adjust exponent vectors instructions is in | ||
| [**Alpha** state](#current-status-and-anticipated-changes) and might change or be | ||
| extended in the future. | ||
|
|
||
| #### BFSCALE | ||
|
|
||
| BFloat16 floating-point adjust exponent vectors. | ||
|
|
||
| ``` c | ||
| // Only if __ARM_FEATURE_SVE_BFSCALE != 0 | ||
| svbfloat16_t svscale[_bf16]_m (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the last argument to all these functions should be a |
||
| svbfloat16_t svscale[_bf16]_x (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm); | ||
| svbfloat16_t svscale[_bf16]_z (svbool_t pg, svbfloat16_t zdn, svbfloat16_t zm); | ||
| svbfloat16_t svscale[_n_bf16]_m (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm); | ||
| svbfloat16_t svscale[_n_bf16]_x (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm); | ||
| svbfloat16_t svscale[_n_bf16]_z (svbool_t pg, svbfloat16_t zdn, bfloat16_t zm); | ||
| ``` | ||
|
|
||
| ### SVE2.1 instruction intrinsics | ||
|
|
||
| The specification for SVE2.1 is in | ||
|
|
@@ -11639,7 +11676,7 @@ Multi-vector floating-point fused multiply-add/subtract | |
| __arm_streaming __arm_inout("za"); | ||
| ``` | ||
|
|
||
| #### BFMLA. BFMLS, FMLA, FMLS (indexed) | ||
| #### BFMLA, BFMLS, FMLA, FMLS (indexed) | ||
|
|
||
| Multi-vector floating-point fused multiply-add/subtract | ||
|
|
||
|
|
@@ -12732,6 +12769,30 @@ element types. | |
| svint8x4_t svuzpq[_s8_x4](svint8x4_t zn) __arm_streaming; | ||
| ``` | ||
|
|
||
| #### BFMUL | ||
|
|
||
| BFloat16 Multi-vector floating-point multiply | ||
|
|
||
| ``` c | ||
| // Only if __ARM_FEATURE_SVE_BFSCALE != 0 | ||
amilendra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| svbfloat16x2_t svmul[_bf16_x2](svbfloat16x2_t zd, svbfloat16x2_t zm) __arm_streaming; | ||
| svbfloat16x2_t svmul[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zm) __arm_streaming; | ||
| svbfloat16x4_t svmul[_bf16_x4](svbfloat16x4_t zd, svbfloat16x4_t zm) __arm_streaming; | ||
| svbfloat16x4_t svmul[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zm) __arm_streaming; | ||
| ``` | ||
|
|
||
| #### BFSCALE | ||
|
|
||
| BFloat16 floating-point adjust exponent vectors. | ||
|
|
||
| ``` c | ||
| // Only if __ARM_FEATURE_SVE_BFSCALE != 0 | ||
| svbfloat16x2_t svscale[_bf16_x2](svbfloat16x2_t zdn, svint16x2_t zm); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these sme instructions? Should it also be followed by: __arm_streaming; |
||
| svbfloat16x2_t svscale[_single_bf16_x2](svbfloat16x2_t zn, svint16_t zm); | ||
| svbfloat16x4_t svscale[_bf16_x4](svbfloat16x4_t zdn, svint16x4_t zm); | ||
| svbfloat16x4_t svscale[_single_bf16_x4](svbfloat16x4_t zn, svint16_t zm); | ||
| ``` | ||
|
|
||
| ### SME2.1 instruction intrinsics | ||
|
|
||
| The specification for SME2.1 is in | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to SVE instead of SVE2?