-
Notifications
You must be signed in to change notification settings - Fork 304
Description
The _mm256_alignr_epi8 implementation and test do not match the Intel specs. The instruction is annoyingly split into two separate 128 bit lanes. Or in other words the same as _mm_alignr_epi8 being applied independently to the upper and lower 128 bits lanes.
See
https://software.intel.com/en-us/blogs/2015/01/13/programming-using-avx2-permutations
For an explanation of how it's implemented and work arounds for the split lanes in some avx2 instructions. It's probably worth double checking the other instructions there.
To make it more confusing in AVX-512 The *_epi32 and *_epi64 variants of alignr do not split lanes, while the *_epi8 variants still do.
I maybe mistaken but I thought there were tests to make sure the correct instruction was generated, It seems unlikely to have accidentally of passed that test.