Skip to content

Conversation

@jiepan-intel
Copy link
Contributor

It's successor of PR #22430, one 256-bit AVX2 intrinsic can be also emulated by two 128-bit intrinsics.

* - _mm_i64gather_epi64
- ❌ scalarized

All the 128-bit wide instructions from AVX2 instruction set are listed. Only a small part of the 256-bit AVX2 instruction set are listed, most of the 256-bit wide AVX2 instructions are emulated by two 128-bit wide instructions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wrap text at 80 here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sbc100 sbc100 requested a review from tlively December 3, 2024 02:31
* found in the LICENSE file.
*/

#ifndef __emscripten_immintrin_h__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a different macro that gets undefined after all the relevant includes in immintrin.h. Otherwise, we won't emit this error if the user does something like this:

#include <immintrin.h>
#include <avxintrin.h>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this case, it won't emit error message, but I think it's suitable.

Actually, include avxintrin.h directly will miss prerequisites and cause compile error. Include immintrin.h can avoid this error.
immintrin.h includes the necessary prerequisites and choose the *intrin.h corresponding to options(-mavx, -mavx2...).
so in the following code, if -mavx option is specified, #include <avxintrin.h> does nothing due to header guards, it's valid code, the behavior is consistent with LLVM.

#include <immintrin.h>
#include <avxintrin.h>

Comment on lines 1134 to 1138
// This may cause an out-of-bounds memory load since we first load and
// then mask, but since there are no segmentation faults in Wasm memory
// accesses, that is ok (as long as we are within the heap bounds -
// a negligible limitation in practice)
// TODO, loadu or load, 128-bit align?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this cause ASan failures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snippet is ported from old code in avxintrin.h, I am not sure, I will check that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is changed, scalarized version is used now, it's more compliant with specifications.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this landing once @sbc100's documentation comments are addressed and the tests are passing, etc.

@kripken
Copy link
Member

kripken commented Jan 23, 2025

Looks like this was lgtm'd pending tests passing, and they just passed, landing.

@kripken kripken merged commit ee32d3a into emscripten-core:main Jan 23, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants