Skip to content

Conversation

Raimo33
Copy link

@Raimo33 Raimo33 commented Jul 18, 2025

This adds sse2 and avx2 support to the library in general, as discussed in #1700, wherever it yields an improvement as per the benchmarks.

arm has different SIMD instruction set; it would be nice to have a separate PR implementing that as well. Maybe after this is merged...

Tasks:

  • Add 2 CI/CD flows with permutations of -msse2, -mno-sse2 when building for amd32
  • Add 4 CI/CD flows with permutations of -mavx2, -msse2, -mno-avx2, -mno-sse2 when building for amd64
  • Precompute vectors at startup (the ones marked with TODO: precompute )

Test & Benchmark

To reproduce the following results I temporarily added 3 scripts for building, testing, benchmarking as well as a jupyter notebook to visualize results.
You can verify yourself by running: ./simd-build.sh && ./simd-test.sh && ./simd-bench.sh and executing the notebook as is.

Results

plot

@Raimo33 Raimo33 changed the title Add simd Add intel simd Jul 18, 2025
@Raimo33
Copy link
Author

Raimo33 commented Jul 18, 2025

To precompute simd constants at the start, the best solution I found was doing something like this:

#ifdef __SSE2__
  static __m128i _128_vec_ones;
#endif

CONSTRUCTOR void simd_init(void)
{
#ifdef __SSE2__
  _128_vec_ones   = _mm_set1_epi8('1');
#endif
}

where CONSTRUCTOR is __attribute__((constructor))

@Raimo33
Copy link
Author

Raimo33 commented Jul 18, 2025

I'm constantly getting these warnings. Apparently they're harmless since I always use loadu and storeu, but for some reason the compiler doesn't like them.

warning: cast increases required alignment of target type [-Wcast-align]
  653 |         _mm256_storeu_si256((__m256i *)r->v, out);

The only fixes I found are:

  1. aligning everything to 64bytes (impossible, breaks even some of my avx logic)
  2. suppress the warning globally
  3. suppress the warning inline each time

@Raimo33 Raimo33 changed the title Add intel simd [WIP] Add intel simd Aug 23, 2025
@Raimo33 Raimo33 force-pushed the simd branch 9 times, most recently from 2fe10e8 to d409172 Compare August 31, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant