Skip to content

Conversation

@mr-c
Copy link
Collaborator

@mr-c mr-c commented Oct 17, 2025

As inspired by @victorshoup in #1349 #1348

@mr-c mr-c force-pushed the clmul_arm64_clang branch from 4eb59d6 to df7679f Compare October 17, 2025 13:17
@mr-c mr-c force-pushed the clmul_arm64_clang branch 3 times, most recently from b64f00a to 449290c Compare October 17, 2025 15:02
@victorshoup
Copy link

I tried using this PR on my macbook. The FMA code works perfectly now.
But I'm not getting the PCLMUL code to work: it seems SIMDE_DETECT_CLANG_VERSION_NOT(22,0,0) is true on my platform, so I'm not getting the good stuff...

I'm not sure if it's possible to update my compiler really, and I wouldn't expect others to do so either.
I've just recently updated developer tools on my macbook, so I don't think there is much to do.
Any thoughts?

$ gcc --version
Apple clang version 17.0.0 (clang-1700.3.19.1)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

@victorshoup
Copy link

I suppose I could "brew install gcc"...I think I've done this in the past on another machine...I always get a bit nervous doing that...or do you think I'm generally better off this way?

@victorshoup
Copy link

More feedback... I installed gcc-15 using homebrew. My PCLMUL code now fails to compile:

/opt/homebrew/Cellar/gcc/15.2.0/lib/gcc/current/gcc/aarch64-apple-darwin24/15/include/arm_neon.h: In function 'void pclmul_mul1(long unsigned int*, long unsigned int, long unsigned int)':
/opt/homebrew/Cellar/gcc/15.2.0/lib/gcc/current/gcc/aarch64-apple-darwin24/15/include/arm_neon.h:7297:1: error: inlining failed in call to 'always_inline' 'poly128_t vmull_p64(poly64_t, poly64_t)': target specific option mismatch
7297 | vmull_p64 (poly64_t __a, poly64_t __b)
| ^~~~~~~~~
In file included from /Users/shoup/repos/simde/x86/avx512/../sse3.h:30,
from /Users/shoup/repos/simde/x86/avx512/../ssse3.h:30,
from /Users/shoup/repos/simde/x86/avx512/../sse4.1.h:31,
from /Users/shoup/repos/simde/x86/avx512/../sse4.2.h:31,
from /Users/shoup/repos/simde/x86/avx512/../avx.h:32:
/Users/shoup/repos/simde/x86/clmul.h:210:18: note: called from here
210 | vmull_p64(
| ~~~~~~~~~^~~
211 | vgetq_lane_p64(vreinterpretq_p64_u64(simde__m128i_to_neon_u64(a)), (imm8 ) & 1),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
212 | vgetq_lane_p64(vreinterpretq_p64_u64(simde__m128i_to_neon_u64(b)), (imm8 >> 4) & 1)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213 | )
| ~

Note that the calling code is just this:

void
pclmul_mul1 (unsigned long *c, unsigned long a, unsigned long b)
{
   __m128i aa = _mm_setr_epi64( _mm_cvtsi64_m64(a), _mm_cvtsi64_m64(0));
   __m128i bb = _mm_setr_epi64( _mm_cvtsi64_m64(b), _mm_cvtsi64_m64(0));
   _mm_storeu_si128((__m128i*)c, _mm_clmulepi64_si128(aa, bb, 0));
}

Not sure what to make of this...
So for now, my asm hack is still the only working solution for me...

I'm not in a rush, though...I will be happy to help in any way I can...

@mr-c mr-c force-pushed the clmul_arm64_clang branch from 449290c to b047a45 Compare January 3, 2026 12:08
@mr-c mr-c force-pushed the clmul_arm64_clang branch from b047a45 to d8970fa Compare January 12, 2026 11:38
@mr-c
Copy link
Collaborator Author

mr-c commented Jan 12, 2026

vmull_p64 uses the 1Q destination register which requires the ARM NEON crypto extension / FEAT_PMULL as per https://developer.arm.com/documentation/ddi0602/2025-06/SIMD-FP-Instructions/PMULL--PMULL2--Polynomial-multiply-long-

https://arm-software.github.io/acle/main/acle.html#aes-extension confirms that FEAT_PMULL is signaled by __ARM_FEATURE_AES

@mr-c mr-c force-pushed the clmul_arm64_clang branch from 9db4455 to 3f42074 Compare January 12, 2026 13:51
@mr-c
Copy link
Collaborator Author

mr-c commented Jan 12, 2026

We can't test using -mcpu=native on Github Action Hosted Runners, because those virtual machines block some processor features, yet still advertise them to the operating system as see in /proc/cpuinfo and the machine registers.

@mr-c mr-c force-pushed the clmul_arm64_clang branch from 3f42074 to 2605853 Compare January 12, 2026 19:04
@mr-c mr-c marked this pull request as ready for review January 12, 2026 19:06
mr-c and others added 2 commits January 12, 2026 20:14
@mr-c mr-c force-pushed the clmul_arm64_clang branch from 2605853 to 243490f Compare January 12, 2026 20:33
@mr-c mr-c force-pushed the clmul_arm64_clang branch from 243490f to 1739ab4 Compare January 13, 2026 07:41
@mr-c mr-c enabled auto-merge (rebase) January 13, 2026 09:12
@mr-c mr-c merged commit 4500802 into simd-everywhere:master Jan 13, 2026
122 checks passed
@mr-c mr-c deleted the clmul_arm64_clang branch January 13, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants