-
Notifications
You must be signed in to change notification settings - Fork 299
arm64 improvements for x86 clmul & fma #1350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4eb59d6 to
df7679f
Compare
b64f00a to
449290c
Compare
|
I tried using this PR on my macbook. The FMA code works perfectly now. I'm not sure if it's possible to update my compiler really, and I wouldn't expect others to do so either. $ gcc --version |
|
I suppose I could "brew install gcc"...I think I've done this in the past on another machine...I always get a bit nervous doing that...or do you think I'm generally better off this way? |
|
More feedback... I installed gcc-15 using homebrew. My PCLMUL code now fails to compile:
Note that the calling code is just this: Not sure what to make of this... I'm not in a rush, though...I will be happy to help in any way I can... |
449290c to
b047a45
Compare
b047a45 to
d8970fa
Compare
|
https://arm-software.github.io/acle/main/acle.html#aes-extension confirms that |
9db4455 to
3f42074
Compare
|
We can't test using |
3f42074 to
2605853
Compare
`vmull_p64` requires `FEAT_PMULL`/`__ARM_FEATURE_AES` https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vmull_p64 uses the `1Q` destination register which requires the ARM NEON crypto extension /`FEAT_PMULL` as per: https://developer.arm.com/documentation/ddi0602/2025-06/SIMD-FP-Instructions/PMULL--PMULL2--Polynomial-multiply-long- https://arm-software.github.io/acle/main/acle.html#aes-extension confirms that `FEAT_PMULL` is signaled by `__ARM_FEATURE_AES`
Co-authored-by: Victor Shoup <[email protected]>
2605853 to
243490f
Compare
Confirmed: `vmull{,_high}_p64` requires `FEAT_PMULL`/`__ARM_FEATURE_AES`
https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vmull_p64
uses the `1Q` destination register which requires the ARM NEON crypto extension
/`FEAT_PMULL` as per:
https://developer.arm.com/documentation/ddi0602/2025-06/SIMD-FP-Instructions/PMULL--PMULL2--Polynomial-multiply-long-
https://arm-software.github.io/acle/main/acle.html#aes-extension confirms that
`FEAT_PMULL` is signaled by `__ARM_FEATURE_AES`
243490f to
1739ab4
Compare
As inspired by @victorshoup in #1349 #1348