-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics to be used in constexpr (#156611) #161094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Implements InterpBuiltin support and adds CodeGen tests. Issue: llvm#156611
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-x86 Author: Nagraj Gaonkar (NagrajMG) Changes[Headers][X86] Allow PSHUFD/PSHUFLW/PSHUFW shuffle intrinsics to be used in
|
| Intrinsic | X86 Builtin | CPUID Flags | Header |
|---|---|---|---|
_mm_shuffle_pi16 |
__builtin_ia32_pshufw |
MMX | mmintrin.h |
PSHUFLW — shuffle low 4×i16 per 128-bit lane
| Intrinsics | X86 Builtins | CPUID Flags | Header |
|---|---|---|---|
_mm_shufflelo_epi16 |
__builtin_ia32_pshuflw |
SSE2 | emmintrin.h |
_mm256_shufflelo_epi16 |
__builtin_ia32_pshuflw256 |
AVX2 | avx2intrin.h |
_mm512_shufflelo_epi16 |
__builtin_ia32_pshuflw512 |
AVX-512BW | avx512bwintrin.h |
_mm_mask_shufflelo_epi16 |
__builtin_ia32_pshuflw128_mask |
AVX-512VL+BW | avx512vlbwintrin.h |
_mm256_mask_shufflelo_epi16 |
__builtin_ia32_pshuflw256_mask |
AVX-512VL+BW | avx512vlbwintrin.h |
_mm512_mask_shufflelo_epi16 |
__builtin_ia32_pshuflw512_mask |
AVX-512BW | avx512bwintrin.h |
_mm_maskz_shufflelo_epi16 |
__builtin_ia32_pshuflw128_maskz |
AVX-512VL+BW | avx512vlbwintrin.h |
_mm256_maskz_shufflelo_epi16 |
__builtin_ia32_pshuflw256_maskz |
AVX-512VL+BW | avx512vlbwintrin.h |
_mm512_maskz_shufflelo_epi16 |
__builtin_ia32_pshuflw512_maskz |
AVX-512BW | avx512bwintrin.h |
PSHUFHW — shuffle high 4×i16 per 128-bit lane
| Intrinsics | X86 Builtins | CPUID Flags | Header |
|---|---|---|---|
_mm_shufflehi_epi16 |
__builtin_ia32_pshufhw |
SSE2 | emmintrin.h |
_mm256_shufflehi_epi16 |
__builtin_ia32_pshufhw256 |
AVX2 | avx2intrin.h |
_mm512_shufflehi_epi16 |
__builtin_ia32_pshufhw512 |
AVX-512BW | avx512bwintrin.h |
_mm_mask_shufflehi_epi16 |
__builtin_ia32_pshufhw128_mask |
AVX-512VL+BW | avx512vlbwintrin.h |
_mm256_mask_shufflehi_epi16 |
__builtin_ia32_pshufhw256_mask |
AVX-512VL+BW | avx512vlbwintrin.h |
_mm512_mask_shufflehi_epi16 |
__builtin_ia32_pshufhw512_mask |
AVX-512BW | avx512bwintrin.h |
_mm_maskz_shufflehi_epi16 |
__builtin_ia32_pshufhw128_maskz |
AVX-512VL+BW | avx512vlbwintrin.h |
_mm256_maskz_shufflehi_epi16 |
__builtin_ia32_pshufhw256_maskz |
AVX-512VL+BW | avx512vlbwintrin.h |
_mm512_maskz_shufflehi_epi16 |
__builtin_ia32_pshufhw512_maskz |
AVX-512BW | avx512bwintrin.h |
PSHUFD — shuffle 4×i32 per 128-bit lane
| Intrinsics | X86 Builtins | CPUID Flags | Header |
|---|---|---|---|
_mm_shuffle_epi32 |
__builtin_ia32_pshufd |
SSE2 | emmintrin.h |
_mm256_shuffle_epi32 |
__builtin_ia32_pshufd256 |
AVX2 | avx2intrin.h |
_mm512_shuffle_epi32 |
__builtin_ia32_pshufd512 |
AVX-512F | avx512fintrin.h |
_mm_mask_shuffle_epi32 |
__builtin_ia32_pshufd128_mask |
AVX-512VL | avx512vlintrin.h |
_mm256_mask_shuffle_epi32 |
__builtin_ia32_pshufd256_mask |
AVX-512VL | avx512vlintrin.h |
_mm512_mask_shuffle_epi32 |
__builtin_ia32_pshufd512_mask |
AVX-512F | avx512fintrin.h |
_mm_maskz_shuffle_epi32 |
__builtin_ia32_pshufd128_maskz |
AVX-512VL | avx512vlintrin.h |
_mm256_maskz_shuffle_epi32 |
__builtin_ia32_pshufd256_maskz |
AVX-512VL | avx512vlintrin.h |
_mm512_maskz_shuffle_epi32 |
__builtin_ia32_pshufd512_maskz |
AVX-512F | avx512fintrin.h |
Fixes #156611
Adds constexpr evaluation to these intrinsics in both the ExprConstant evaluator and the Bytecode Interpreter, with tests for all unmasked, masked, and mask-zero variants across MMX, 128-bit, 256-bit, and 512-bit widths.
Patch is 49.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161094.diff
11 Files Affected:
- (modified) clang/include/clang/Basic/BuiltinsX86.td (+57-7)
- (modified) clang/lib/AST/ByteCode/InterpBuiltin.cpp (+246)
- (modified) clang/lib/AST/ExprConstant.cpp (+287)
- (modified) clang/lib/Headers/mmintrin.h (+6)
- (modified) clang/test/CodeGen/X86/avx2-builtins.c (+5)
- (modified) clang/test/CodeGen/X86/avx512bw-builtins.c (+11-1)
- (modified) clang/test/CodeGen/X86/avx512f-builtins.c (+14)
- (modified) clang/test/CodeGen/X86/avx512vl-builtins.c (+20)
- (modified) clang/test/CodeGen/X86/avx512vlbw-builtins.c (+60-1)
- (modified) clang/test/CodeGen/X86/mmx-builtins.c (+2-1)
- (modified) clang/test/CodeGen/X86/sse2-builtins.c (+10-6)
diff --git a/clang/include/clang/Basic/BuiltinsX86.td b/clang/include/clang/Basic/BuiltinsX86.td
index 77e599587edc3..08b82b03b7865 100644
--- a/clang/include/clang/Basic/BuiltinsX86.td
+++ b/clang/include/clang/Basic/BuiltinsX86.td
@@ -145,6 +145,10 @@ let Features = "mmx", Header = "mmintrin.h", Attributes = [NoThrow, Const] in {
def _m_prefetch : X86LibBuiltin<"void(void *)">;
}
+let Features = "mmx", Attributes = [NoThrow, Const, Constexpr] in {
+ def pshufw : X86Builtin<"_Vector<4, short>(_Vector<4, short>, _Constant int)">;
+}
+
// PRFCHW
let Features = "prfchw", Header = "intrin.h", Attributes = [NoThrow, Const] in {
def _m_prefetchw : X86LibBuiltin<"void(void volatile const *)">;
@@ -217,10 +221,13 @@ let Features = "sse2", Attributes = [NoThrow] in {
def movnti : X86Builtin<"void(int *, int)">;
}
-let Features = "sse2", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
- def pshufd : X86Builtin<"_Vector<4, int>(_Vector<4, int>, _Constant int)">;
+let Features = "sse2", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in {
def pshuflw : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int)">;
+ def pshufd : X86Builtin<"_Vector<4, int>(_Vector<4, int>, _Constant int)">;
def pshufhw : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int)">;
+}
+
+let Features = "sse2", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
def psadbw128 : X86Builtin<"_Vector<2, long long int>(_Vector<16, char>, _Vector<16, char>)">;
def sqrtpd : X86Builtin<"_Vector<2, double>(_Vector<2, double>)">;
def sqrtsd : X86Builtin<"_Vector<2, double>(_Vector<2, double>)">;
@@ -569,6 +576,12 @@ let Features = "avx", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in
def vec_set_v8si : X86Builtin<"_Vector<8, int>(_Vector<8, int>, int, _Constant int)">;
}
+let Features = "avx2", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<256>] in {
+ def pshuflw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int)">;
+ def pshufhw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int)">;
+ def pshufd256 : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Constant int)">;
+}
+
let Features = "avx2", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in {
def mpsadbw256 : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Vector<32, char>, _Constant char)">;
def palignr256 : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Vector<32, char>, _Constant int)">;
@@ -584,9 +597,6 @@ let Features = "avx2", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] i
def pmulhrsw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<16, short>)">;
def psadbw256 : X86Builtin<"_Vector<4, long long int>(_Vector<32, char>, _Vector<32, char>)">;
def pshufb256 : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Vector<32, char>)">;
- def pshufd256 : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Constant int)">;
- def pshuflw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int)">;
- def pshufhw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int)">;
def psignb256 : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Vector<32, char>)">;
def psignw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<16, short>)">;
def psignd256 : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Vector<8, int>)">;
@@ -1989,9 +1999,28 @@ let Features = "avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVect
def prorq256 : X86Builtin<"_Vector<4, long long int>(_Vector<4, long long int>, _Constant int)">;
}
-let Features = "avx512bw", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
+let Features = "avx512bw", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
def pshufhw512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int)">;
def pshuflw512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int)">;
+}
+
+let Features = "avx512f", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
+ def pshufd512 : X86Builtin<"_Vector<16, int>(_Vector<16, int>, _Constant int)">;
+ def pshufd512_mask : X86Builtin<"_Vector<16, int>(_Vector<16, int>, _Constant int, _Vector<16, int>, unsigned short)">;
+ def pshufd512_maskz : X86Builtin<"_Vector<16, int>(_Vector<16, int>, _Constant int, unsigned short)">;
+}
+
+let Features = "avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<256>] in {
+ def pshufd256_mask : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Constant int, _Vector<8, int>, unsigned char)">;
+ def pshufd256_maskz : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Constant int, unsigned char)">;
+}
+
+let Features = "avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in {
+ def pshufd128_mask : X86Builtin<"_Vector<4, int>(_Vector<4, int>, _Constant int, _Vector<4, int>, unsigned char)">;
+ def pshufd128_maskz : X86Builtin<"_Vector<4, int>(_Vector<4, int>, _Constant int, unsigned char)">;
+}
+
+let Features = "avx512bw", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
def psllw512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Vector<8, short>)">;
}
@@ -3266,7 +3295,6 @@ let Features = "avx512f", Attributes = [NoThrow, Const, RequiredVectorWidth<128>
}
let Features = "avx512f", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
- def pshufd512 : X86Builtin<"_Vector<16, int>(_Vector<16, int>, _Constant int)">;
def expanddf512_mask : X86Builtin<"_Vector<8, double>(_Vector<8, double>, _Vector<8, double>, unsigned char)">;
def expanddi512_mask : X86Builtin<"_Vector<8, long long int>(_Vector<8, long long int>, _Vector<8, long long int>, unsigned char)">;
}
@@ -5114,3 +5142,25 @@ let Features = "avx10.2", Attributes = [NoThrow, Const, RequiredVectorWidth<256>
let Features = "avx10.2", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
def vsqrtbf16512 : X86Builtin<"_Vector<32, __bf16>(_Vector<32, __bf16>)">;
}
+
+let Features = "avx512bw", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
+ def pshuflw512_mask : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int, _Vector<32, short>, unsigned int)">;
+ def pshuflw512_maskz : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int, unsigned int)">;
+ def pshufhw512_mask : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int, _Vector<32, short>, unsigned int)">;
+ def pshufhw512_maskz : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int, unsigned int)">;
+}
+
+
+let Features = "avx512vl,avx512bw", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<256>] in {
+ def pshuflw256_mask : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int, _Vector<16, short>, unsigned short)">;
+ def pshuflw256_maskz : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int, unsigned short)">;
+ def pshufhw256_mask : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int, _Vector<16, short>, unsigned short)">;
+ def pshufhw256_maskz : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int, unsigned short)">;
+}
+
+let Features = "avx512vl,avx512bw", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in {
+ def pshuflw128_mask : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int, _Vector<8, short>, unsigned char)">;
+ def pshuflw128_maskz : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int, unsigned char)">;
+ def pshufhw128_mask : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int, _Vector<8, short>, unsigned char)">;
+ def pshufhw128_maskz : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int, unsigned char)">;
+}
diff --git a/clang/lib/AST/ByteCode/InterpBuiltin.cpp b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
index 891344d4e6ed0..1156626a30c8a 100644
--- a/clang/lib/AST/ByteCode/InterpBuiltin.cpp
+++ b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
@@ -2862,6 +2862,218 @@ static bool interp__builtin_blend(InterpState &S, CodePtr OpPC,
return true;
}
+static bool interp__builtin_ia32_pshuflw_common(InterpState &S, CodePtr OpPC,
+ const CallExpr *Call) {
+ const unsigned NumArgs = Call->getNumArgs();
+ assert(NumArgs == 2 || NumArgs == 3 || NumArgs == 4);
+ APSInt K;
+ Pointer SrcPT;
+ const bool HasMask = (NumArgs == 3) || (NumArgs == 4);
+ const bool IsMaskZ = (NumArgs == 3);
+ if (NumArgs == 4) {
+ K = popToAPSInt(S, Call->getArg(3));
+ SrcPT = S.Stk.pop<Pointer>();
+ } else if (NumArgs == 3) {
+ K = popToAPSInt(S, Call->getArg(2));
+ }
+
+ APSInt Imm = popToAPSInt(S, Call->getArg(1));
+ const Pointer &Src = S.Stk.pop<Pointer>();
+ const Pointer &Dst = S.Stk.peek<Pointer>();
+ const unsigned NumElems = Dst.getNumElems();
+ const PrimType ElemT = Dst.getFieldDesc()->getPrimType();
+ const unsigned ElemBits = 16;
+ const unsigned LaneElems = 128u / ElemBits;
+ const unsigned Half = 4;
+ assert(NumElems % LaneElems == 0 && "pshuflw expects 128-bit lanes");
+ const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+
+ for (unsigned i = 0; i != NumElems; ++i) {
+ const unsigned laneBase = (i / LaneElems) * LaneElems;
+ const unsigned inLane = i % LaneElems;
+
+ unsigned srcIdx;
+ if (inLane < Half) {
+ const unsigned pos = inLane;
+ const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
+ srcIdx = laneBase + sel;
+ } else {
+ srcIdx = i;
+ }
+
+ APSInt Chosen;
+ INT_TYPE_SWITCH(ElemT, { Chosen = Src.elem<T>(srcIdx).toAPSInt(); });
+
+ if (!HasMask) {
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+ continue;
+ }
+
+ const bool Keep =
+ (i < static_cast<unsigned>(K.getBitWidth())) ? K[i] : false;
+
+ if (Keep) {
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+ } else if (IsMaskZ) {
+ APSInt Zero(APInt(Chosen.getBitWidth(), 0));
+ Zero.setIsSigned(Chosen.isSigned());
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Zero); });
+ } else {
+ APSInt PT;
+ INT_TYPE_SWITCH(ElemT, { PT = SrcPT.elem<T>(i).toAPSInt(); });
+ INT_TYPE_SWITCH_NO_BOOL(ElemT, { Dst.elem<T>(i) = static_cast<T>(PT); });
+ }
+ }
+
+ Dst.initializeAllElements();
+ return true;
+}
+
+static bool interp__builtin_ia32_pshufhw_common(InterpState &S, CodePtr OpPC,
+ const CallExpr *Call) {
+ (void)OpPC;
+ const unsigned NumArgs = Call->getNumArgs();
+ assert(NumArgs == 2 || NumArgs == 3 || NumArgs == 4);
+
+ APSInt K;
+ Pointer SrcPT;
+ const bool HasMask = (NumArgs == 3) || (NumArgs == 4);
+ const bool IsMaskZ = (NumArgs == 3);
+
+ if (NumArgs == 4) {
+ K = popToAPSInt(S, Call->getArg(3));
+ SrcPT = S.Stk.pop<Pointer>();
+ } else if (NumArgs == 3) {
+ K = popToAPSInt(S, Call->getArg(2));
+ }
+
+ APSInt Imm = popToAPSInt(S, Call->getArg(1));
+ const Pointer &Src = S.Stk.pop<Pointer>();
+ const Pointer &Dst = S.Stk.peek<Pointer>();
+
+ const unsigned NumElems = Dst.getNumElems();
+ const PrimType ElemT = Dst.getFieldDesc()->getPrimType();
+
+ const unsigned ElemBits = 16;
+ const unsigned LaneElems = 128u / ElemBits;
+ const unsigned HalfBase = 4;
+ assert(NumElems % LaneElems == 0);
+
+ const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+
+ for (unsigned i = 0; i != NumElems; ++i) {
+ const unsigned laneBase = (i / LaneElems) * LaneElems;
+ const unsigned inLane = i % LaneElems;
+
+ unsigned srcIdx;
+ if (inLane >= HalfBase) {
+ const unsigned pos = inLane - HalfBase;
+ const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
+ srcIdx = laneBase + HalfBase + sel;
+ } else {
+ srcIdx = i;
+ }
+
+ APSInt Chosen;
+ INT_TYPE_SWITCH(ElemT, { Chosen = Src.elem<T>(srcIdx).toAPSInt(); });
+
+ if (!HasMask) {
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+ continue;
+ }
+
+ const bool Keep =
+ (i < static_cast<unsigned>(K.getBitWidth())) ? K[i] : false;
+ if (Keep) {
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+ } else if (IsMaskZ) {
+ APSInt Zero(APInt(Chosen.getBitWidth(), 0));
+ Zero.setIsSigned(Chosen.isSigned());
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Zero); });
+ } else {
+ APSInt PT;
+ INT_TYPE_SWITCH(ElemT, { PT = SrcPT.elem<T>(i).toAPSInt(); });
+ INT_TYPE_SWITCH_NO_BOOL(ElemT, { Dst.elem<T>(i) = static_cast<T>(PT); });
+ }
+ }
+
+ Dst.initializeAllElements();
+ return true;
+}
+
+static bool interp__builtin_ia32_pshufd_common(InterpState &S, CodePtr OpPC,
+ const CallExpr *Call) {
+ (void)OpPC;
+ const unsigned NumArgs = Call->getNumArgs();
+ assert(NumArgs == 2 || NumArgs == 3 || NumArgs == 4);
+
+ APSInt K;
+ Pointer SrcPT;
+ const bool HasMask = (NumArgs == 3) || (NumArgs == 4);
+ const bool IsMaskZ = (NumArgs == 3);
+
+ if (NumArgs == 4) {
+ K = popToAPSInt(S, Call->getArg(3));
+ SrcPT = S.Stk.pop<Pointer>();
+ } else if (NumArgs == 3) {
+ K = popToAPSInt(S, Call->getArg(2));
+ }
+
+ APSInt Imm = popToAPSInt(S, Call->getArg(1));
+ const Pointer &Src = S.Stk.pop<Pointer>();
+ const Pointer &Dst = S.Stk.peek<Pointer>();
+
+ const unsigned NumElems = Dst.getNumElems();
+ const PrimType ElemT = Dst.getFieldDesc()->getPrimType();
+
+ const unsigned ElemBits = 32;
+ const unsigned LaneElems = 128u / ElemBits;
+ assert(NumElems % LaneElems == 0);
+
+ const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+
+ for (unsigned i = 0; i != NumElems; ++i) {
+ const unsigned laneBase = (i / LaneElems) * LaneElems;
+ const unsigned inLane = i % LaneElems;
+ const unsigned sel = (Ctl >> (2 * inLane)) & 0x3;
+ const unsigned srcIdx = laneBase + sel;
+
+ APSInt Chosen;
+ INT_TYPE_SWITCH(ElemT, { Chosen = Src.elem<T>(srcIdx).toAPSInt(); });
+
+ if (!HasMask) {
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+ continue;
+ }
+
+ const bool Keep =
+ (i < static_cast<unsigned>(K.getBitWidth())) ? K[i] : false;
+ if (Keep) {
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+ } else if (IsMaskZ) {
+ APSInt Zero(APInt(Chosen.getBitWidth(), 0));
+ Zero.setIsSigned(Chosen.isSigned());
+ INT_TYPE_SWITCH_NO_BOOL(ElemT,
+ { Dst.elem<T>(i) = static_cast<T>(Zero); });
+ } else {
+ APSInt PT;
+ INT_TYPE_SWITCH(ElemT, { PT = SrcPT.elem<T>(i).toAPSInt(); });
+ INT_TYPE_SWITCH_NO_BOOL(ElemT, { Dst.elem<T>(i) = static_cast<T>(PT); });
+ }
+ }
+
+ Dst.initializeAllElements();
+ return true;
+}
+
static bool interp__builtin_elementwise_triop(
InterpState &S, CodePtr OpPC, const CallExpr *Call,
llvm::function_ref<APInt(const APSInt &, const APSInt &, const APSInt &)>
@@ -2967,6 +3179,7 @@ static bool interp__builtin_x86_insert_subvector(InterpState &S, CodePtr OpPC,
return true;
}
+
bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
uint32_t BuiltinID) {
if (!S.getASTContext().BuiltinInfo.isConstantEvaluated(BuiltinID))
@@ -3417,6 +3630,39 @@ bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
return interp__builtin_elementwise_int_binop(S, OpPC, Call,
llvm::APIntOps::mulhs);
+ case clang::X86::BI__builtin_ia32_pshuflw:
+ case clang::X86::BI__builtin_ia32_pshuflw256:
+ case clang::X86::BI__builtin_ia32_pshuflw512:
+ case clang::X86::BI__builtin_ia32_pshuflw128_mask:
+ case clang::X86::BI__builtin_ia32_pshuflw256_mask:
+ case clang::X86::BI__builtin_ia32_pshuflw512_mask:
+ case clang::X86::BI__builtin_ia32_pshuflw128_maskz:
+ case clang::X86::BI__builtin_ia32_pshuflw256_maskz:
+ case clang::X86::BI__builtin_ia32_pshuflw512_maskz:
+ return interp__builtin_ia32_pshuflw_common(S, OpPC, Call);
+
+ case clang::X86::BI__builtin_ia32_pshufhw:
+ case clang::X86::BI__builtin_ia32_pshufhw256:
+ case clang::X86::BI__builtin_ia32_pshufhw512:
+ case clang::X86::BI__builtin_ia32_pshufhw128_mask:
+ case clang::X86::BI__builtin_ia32_pshufhw256_mask:
+ case clang::X86::BI__builtin_ia32_pshufhw512_mask:
+ case clang::X86::BI__builtin_ia32_pshufhw128_maskz:
+ case clang::X86::BI__builtin_ia32_pshufhw256_maskz:
+ case clang::X86::BI__builtin_ia32_pshufhw512_maskz:
+ return interp__builtin_ia32_pshufhw_common(S, OpPC, Call);
+
+ case clang::X86::BI__builtin_ia32_pshufd:
+ case clang::X86::BI__builtin_ia32_pshufd256:
+ case clang::X86::BI__builtin_ia32_pshufd512:
+ case clang::X86::BI__builtin_ia32_pshufd128_mask:
+ case clang::X86::BI__builtin_ia32_pshufd256_mask:
+ case clang::X86::BI__builtin_ia32_pshufd512_mask:
+ case clang::X86::BI__builtin_ia32_pshufd128_maskz:
+ case clang::X86::BI__builtin_ia32_pshufd256_maskz:
+ case clang::X86::BI__builtin_ia32_pshufd512_maskz:
+ return interp__builtin_ia32_pshufd_common(S, OpPC, Call);
+
case clang::X86::BI__builtin_ia32_psllv2di:
case clang::X86::BI__builtin_ia32_psllv4di:
case clang::X86::BI__builtin_ia32_psllv4si:
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index b706b14945b6d..3fee702120abc 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -11868,6 +11868,292 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
return Success(APValue(ResultElements.data(), ResultElements.size()), E);
}
+case X86::BI__builtin_ia32_pshufw: {
+ APValue Src;
+ APSInt Imm;
+ if (!EvaluateAsRValue(Info, E->getArg(0), Src)) return false;
+ if (!EvaluateInteger(E->getArg(1), Imm, Info)) return false;
+
+ unsigned N = Src.getVectorLength();
+ SmallVector<APValue, 4> ResultElements;
+ ResultElements.reserve(N);
+
+ uint8_t C = static_cast<uint8_t>(Imm.getZExtValue());
+ for (unsigned i = 0; i != N; ++i) {
+ unsigned sel = (C >> (2 * i)) & 0x3;
+ ResultElements.push_back(Src.getVectorElt(sel));
+ }
+ return Success(APValue(ResultElements.data(), ResultElements.size()), E);
+}
+
+case clang::X86::BI__builtin_ia32_pshuflw:
+case clang::X86::BI__builtin_ia32_pshuflw256:
+case clang::X86::BI__builtin_ia32_pshuflw512:
+case clang::X86::BI__builtin_ia32_pshuflw128_mask:
+case clang::X86::BI__builtin_ia32_pshuflw256_mask:
+case clang::X86::BI__builtin_ia32_pshuflw512_mask:
+case clang::X86::BI__builtin_ia32_pshuflw128_maskz:
+case clang::X86::BI__builtin_ia32_pshuflw256_maskz:
+case clang::X86::BI__builtin_ia32_pshuflw512_maskz: {
+ const unsigned BID = E->getBuiltinCallee();
+
+ const bool IsMask =
+ BID == clang::X86::BI__builtin_ia32_pshuflw128_mask ||
+ BID == clang::X86::BI__builtin_ia32_pshuflw256_mask ||
+ BID == clang::X86::BI__builtin_ia32_pshuflw512_mask;
+
+ const bool IsMaskZ =
+ BID == clang::X86::BI__builtin_ia32_pshuflw128_maskz ||
+ BID == clang::X86::BI__builtin_ia32_pshuflw256_maskz ||
+ BID == clang::X86::BI__builtin_ia32_pshuflw512_maskz;
+
+ const unsigned AIdx = 0, ImmIdx = 1;
+ const unsigned SrcIdx = 2;
+ const unsigned KIdx = IsMaskZ ? 2 : 3;
+
+ APValue AVal, SrcVal;
+ APSInt Imm, K;
+ if (!EvaluateAsRValue(Info, E->getArg(AIdx), AVal)) return false;
+ if (!EvaluateInteger(E->getArg(ImmIdx), Imm, Info)) return false;
+
+ const APSInt *KPtr = nullptr;
+ const APValue *PassThru = nullptr;
+ bool ZeroInactive = false;
+
+ if (IsMask) {
+ if (!EvaluateAsRValue(Info, E->getArg(SrcIdx), SrcVal)) return false;
+ if (!EvaluateInteger(E->getArg(KIdx), K, Info)) return false;
+ KPtr = &K; PassThru = &SrcVal; ZeroInactive = false;
+ } else if (IsMaskZ) {
+ if (!EvaluateInteger(E->getArg(KIdx), K, Info)) return false;
+ KPtr = &K; PassThru = nullptr; ZeroInactive = true;
+ }
+
+ const auto *VT = E->getType()->getAs<VectorType>();
+...
[truncated]
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp,h,c -- clang/lib/AST/ByteCode/InterpBuiltin.cpp clang/lib/AST/ExprConstant.cpp clang/lib/Headers/mmintrin.h clang/test/CodeGen/X86/avx2-builtins.c clang/test/CodeGen/X86/avx512bw-builtins.c clang/test/CodeGen/X86/avx512f-builtins.c clang/test/CodeGen/X86/avx512vl-builtins.c clang/test/CodeGen/X86/avx512vlbw-builtins.c clang/test/CodeGen/X86/mmx-builtins.c clang/test/CodeGen/X86/sse2-builtins.c
View the diff from clang-format here.diff --git a/clang/lib/AST/ByteCode/InterpBuiltin.cpp b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
index 1156626a3..e3b9acac6 100644
--- a/clang/lib/AST/ByteCode/InterpBuiltin.cpp
+++ b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
@@ -3179,7 +3179,6 @@ static bool interp__builtin_x86_insert_subvector(InterpState &S, CodePtr OpPC,
return true;
}
-
bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
uint32_t BuiltinID) {
if (!S.getASTContext().BuiltinInfo.isConstantEvaluated(BuiltinID))
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 3fee70212..a5e9bd69d 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -11868,292 +11868,318 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
return Success(APValue(ResultElements.data(), ResultElements.size()), E);
}
-case X86::BI__builtin_ia32_pshufw: {
- APValue Src;
- APSInt Imm;
- if (!EvaluateAsRValue(Info, E->getArg(0), Src)) return false;
- if (!EvaluateInteger(E->getArg(1), Imm, Info)) return false;
+ case X86::BI__builtin_ia32_pshufw: {
+ APValue Src;
+ APSInt Imm;
+ if (!EvaluateAsRValue(Info, E->getArg(0), Src))
+ return false;
+ if (!EvaluateInteger(E->getArg(1), Imm, Info))
+ return false;
- unsigned N = Src.getVectorLength();
- SmallVector<APValue, 4> ResultElements;
- ResultElements.reserve(N);
+ unsigned N = Src.getVectorLength();
+ SmallVector<APValue, 4> ResultElements;
+ ResultElements.reserve(N);
- uint8_t C = static_cast<uint8_t>(Imm.getZExtValue());
- for (unsigned i = 0; i != N; ++i) {
- unsigned sel = (C >> (2 * i)) & 0x3;
- ResultElements.push_back(Src.getVectorElt(sel));
+ uint8_t C = static_cast<uint8_t>(Imm.getZExtValue());
+ for (unsigned i = 0; i != N; ++i) {
+ unsigned sel = (C >> (2 * i)) & 0x3;
+ ResultElements.push_back(Src.getVectorElt(sel));
+ }
+ return Success(APValue(ResultElements.data(), ResultElements.size()), E);
}
- return Success(APValue(ResultElements.data(), ResultElements.size()), E);
-}
-case clang::X86::BI__builtin_ia32_pshuflw:
-case clang::X86::BI__builtin_ia32_pshuflw256:
-case clang::X86::BI__builtin_ia32_pshuflw512:
-case clang::X86::BI__builtin_ia32_pshuflw128_mask:
-case clang::X86::BI__builtin_ia32_pshuflw256_mask:
-case clang::X86::BI__builtin_ia32_pshuflw512_mask:
-case clang::X86::BI__builtin_ia32_pshuflw128_maskz:
-case clang::X86::BI__builtin_ia32_pshuflw256_maskz:
-case clang::X86::BI__builtin_ia32_pshuflw512_maskz: {
- const unsigned BID = E->getBuiltinCallee();
-
- const bool IsMask =
- BID == clang::X86::BI__builtin_ia32_pshuflw128_mask ||
- BID == clang::X86::BI__builtin_ia32_pshuflw256_mask ||
- BID == clang::X86::BI__builtin_ia32_pshuflw512_mask;
-
- const bool IsMaskZ =
- BID == clang::X86::BI__builtin_ia32_pshuflw128_maskz ||
- BID == clang::X86::BI__builtin_ia32_pshuflw256_maskz ||
- BID == clang::X86::BI__builtin_ia32_pshuflw512_maskz;
-
- const unsigned AIdx = 0, ImmIdx = 1;
- const unsigned SrcIdx = 2;
- const unsigned KIdx = IsMaskZ ? 2 : 3;
-
- APValue AVal, SrcVal;
- APSInt Imm, K;
- if (!EvaluateAsRValue(Info, E->getArg(AIdx), AVal)) return false;
- if (!EvaluateInteger(E->getArg(ImmIdx), Imm, Info)) return false;
-
- const APSInt *KPtr = nullptr;
- const APValue *PassThru = nullptr;
- bool ZeroInactive = false;
-
- if (IsMask) {
- if (!EvaluateAsRValue(Info, E->getArg(SrcIdx), SrcVal)) return false;
- if (!EvaluateInteger(E->getArg(KIdx), K, Info)) return false;
- KPtr = &K; PassThru = &SrcVal; ZeroInactive = false;
- } else if (IsMaskZ) {
- if (!EvaluateInteger(E->getArg(KIdx), K, Info)) return false;
- KPtr = &K; PassThru = nullptr; ZeroInactive = true;
- }
-
- const auto *VT = E->getType()->getAs<VectorType>();
- if (!VT) return false;
- const unsigned NumElts = VT->getNumElements();
-
- const unsigned ElemBits = 16;
- const unsigned LaneElems = std::min(NumElts, 128u / ElemBits);
- const unsigned Half = 4;
- const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
- const bool DestUnsigned =
- VT->getElementType()->isUnsignedIntegerOrEnumerationType();
-
- auto MakeZero = [&]() -> APValue {
- return APValue(APSInt(APInt(ElemBits, 0), DestUnsigned));
- };
+ case clang::X86::BI__builtin_ia32_pshuflw:
+ case clang::X86::BI__builtin_ia32_pshuflw256:
+ case clang::X86::BI__builtin_ia32_pshuflw512:
+ case clang::X86::BI__builtin_ia32_pshuflw128_mask:
+ case clang::X86::BI__builtin_ia32_pshuflw256_mask:
+ case clang::X86::BI__builtin_ia32_pshuflw512_mask:
+ case clang::X86::BI__builtin_ia32_pshuflw128_maskz:
+ case clang::X86::BI__builtin_ia32_pshuflw256_maskz:
+ case clang::X86::BI__builtin_ia32_pshuflw512_maskz: {
+ const unsigned BID = E->getBuiltinCallee();
- SmallVector<APValue, 32> ResultElements;
- ResultElements.reserve(NumElts);
+ const bool IsMask = BID == clang::X86::BI__builtin_ia32_pshuflw128_mask ||
+ BID == clang::X86::BI__builtin_ia32_pshuflw256_mask ||
+ BID == clang::X86::BI__builtin_ia32_pshuflw512_mask;
- for (unsigned i = 0; i < NumElts; ++i) {
- const unsigned laneBase = (i / LaneElems) * LaneElems;
- const unsigned inLane = i % LaneElems;
+ const bool IsMaskZ = BID == clang::X86::BI__builtin_ia32_pshuflw128_maskz ||
+ BID == clang::X86::BI__builtin_ia32_pshuflw256_maskz ||
+ BID == clang::X86::BI__builtin_ia32_pshuflw512_maskz;
- APValue Chosen;
- if (inLane < Half) {
- const unsigned pos = inLane;
- const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
- const unsigned srcIdx = laneBase + sel;
- Chosen = AVal.getVectorElt(srcIdx);
- } else {
- Chosen = AVal.getVectorElt(i);
+ const unsigned AIdx = 0, ImmIdx = 1;
+ const unsigned SrcIdx = 2;
+ const unsigned KIdx = IsMaskZ ? 2 : 3;
+
+ APValue AVal, SrcVal;
+ APSInt Imm, K;
+ if (!EvaluateAsRValue(Info, E->getArg(AIdx), AVal))
+ return false;
+ if (!EvaluateInteger(E->getArg(ImmIdx), Imm, Info))
+ return false;
+
+ const APSInt *KPtr = nullptr;
+ const APValue *PassThru = nullptr;
+ bool ZeroInactive = false;
+
+ if (IsMask) {
+ if (!EvaluateAsRValue(Info, E->getArg(SrcIdx), SrcVal))
+ return false;
+ if (!EvaluateInteger(E->getArg(KIdx), K, Info))
+ return false;
+ KPtr = &K;
+ PassThru = &SrcVal;
+ ZeroInactive = false;
+ } else if (IsMaskZ) {
+ if (!EvaluateInteger(E->getArg(KIdx), K, Info))
+ return false;
+ KPtr = &K;
+ PassThru = nullptr;
+ ZeroInactive = true;
}
- if (KPtr) {
- const bool Keep = (i < KPtr->getBitWidth()) ? (*KPtr)[i] : false;
- if (Keep) {
- ResultElements.push_back(Chosen);
- } else if (ZeroInactive) {
- ResultElements.push_back(MakeZero());
+ const auto *VT = E->getType()->getAs<VectorType>();
+ if (!VT)
+ return false;
+ const unsigned NumElts = VT->getNumElements();
+
+ const unsigned ElemBits = 16;
+ const unsigned LaneElems = std::min(NumElts, 128u / ElemBits);
+ const unsigned Half = 4;
+ const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+ const bool DestUnsigned =
+ VT->getElementType()->isUnsignedIntegerOrEnumerationType();
+
+ auto MakeZero = [&]() -> APValue {
+ return APValue(APSInt(APInt(ElemBits, 0), DestUnsigned));
+ };
+
+ SmallVector<APValue, 32> ResultElements;
+ ResultElements.reserve(NumElts);
+
+ for (unsigned i = 0; i < NumElts; ++i) {
+ const unsigned laneBase = (i / LaneElems) * LaneElems;
+ const unsigned inLane = i % LaneElems;
+
+ APValue Chosen;
+ if (inLane < Half) {
+ const unsigned pos = inLane;
+ const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
+ const unsigned srcIdx = laneBase + sel;
+ Chosen = AVal.getVectorElt(srcIdx);
} else {
- const APValue &PT = PassThru ? PassThru->getVectorElt(i)
- : AVal.getVectorElt(i);
- ResultElements.push_back(PT);
+ Chosen = AVal.getVectorElt(i);
+ }
+
+ if (KPtr) {
+ const bool Keep = (i < KPtr->getBitWidth()) ? (*KPtr)[i] : false;
+ if (Keep) {
+ ResultElements.push_back(Chosen);
+ } else if (ZeroInactive) {
+ ResultElements.push_back(MakeZero());
+ } else {
+ const APValue &PT =
+ PassThru ? PassThru->getVectorElt(i) : AVal.getVectorElt(i);
+ ResultElements.push_back(PT);
+ }
+ } else {
+ ResultElements.push_back(Chosen);
}
- } else {
- ResultElements.push_back(Chosen);
}
+ return Success(APValue(ResultElements.data(), ResultElements.size()), E);
}
- return Success(APValue(ResultElements.data(), ResultElements.size()), E);
-}
-case clang::X86::BI__builtin_ia32_pshufhw:
-case clang::X86::BI__builtin_ia32_pshufhw256:
-case clang::X86::BI__builtin_ia32_pshufhw512:
-case clang::X86::BI__builtin_ia32_pshufhw128_mask:
-case clang::X86::BI__builtin_ia32_pshufhw256_mask:
-case clang::X86::BI__builtin_ia32_pshufhw512_mask:
-case clang::X86::BI__builtin_ia32_pshufhw128_maskz:
-case clang::X86::BI__builtin_ia32_pshufhw256_maskz:
-case clang::X86::BI__builtin_ia32_pshufhw512_maskz: {
- const unsigned BID = E->getBuiltinCallee();
-
- const bool IsMask =
- BID == clang::X86::BI__builtin_ia32_pshufhw128_mask ||
- BID == clang::X86::BI__builtin_ia32_pshufhw256_mask ||
- BID == clang::X86::BI__builtin_ia32_pshufhw512_mask;
-
- const bool IsMaskZ =
- BID == clang::X86::BI__builtin_ia32_pshufhw128_maskz ||
- BID == clang::X86::BI__builtin_ia32_pshufhw256_maskz ||
- BID == clang::X86::BI__builtin_ia32_pshufhw512_maskz;
-
- const unsigned AIdx = 0, ImmIdx = 1;
- const unsigned SrcIdx = 2;
- const unsigned KIdx = IsMaskZ ? 2 : 3;
-
- APValue AVal, SrcVal;
- APSInt Imm, K;
- if (!EvaluateAsRValue(Info, E->getArg(AIdx), AVal)) return false;
- if (!EvaluateInteger(E->getArg(ImmIdx), Imm, Info)) return false;
-
- const APSInt *KPtr = nullptr;
- const APValue *PassThru = nullptr;
- bool ZeroInactive = false;
- if (IsMask) {
- if (!EvaluateAsRValue(Info, E->getArg(SrcIdx), SrcVal)) return false;
- if (!EvaluateInteger(E->getArg(KIdx), K, Info)) return false;
- KPtr = &K; PassThru = &SrcVal; ZeroInactive = false;
- } else if (IsMaskZ) {
- if (!EvaluateInteger(E->getArg(KIdx), K, Info)) return false;
- KPtr = &K; PassThru = nullptr; ZeroInactive = true;
- }
-
- const auto *VT = E->getType()->getAs<VectorType>();
- if (!VT) return false;
- const unsigned NumElts = VT->getNumElements();
- const unsigned ElemBits = 16;
- const unsigned LaneElems = std::min(NumElts, 128u / ElemBits);
- const unsigned Half = 4;
- const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
- const bool DestUnsigned =
- VT->getElementType()->isUnsignedIntegerOrEnumerationType();
-
- auto MakeZero = [&]() -> APValue {
- return APValue(APSInt(APInt(ElemBits, 0), DestUnsigned));
- };
+ case clang::X86::BI__builtin_ia32_pshufhw:
+ case clang::X86::BI__builtin_ia32_pshufhw256:
+ case clang::X86::BI__builtin_ia32_pshufhw512:
+ case clang::X86::BI__builtin_ia32_pshufhw128_mask:
+ case clang::X86::BI__builtin_ia32_pshufhw256_mask:
+ case clang::X86::BI__builtin_ia32_pshufhw512_mask:
+ case clang::X86::BI__builtin_ia32_pshufhw128_maskz:
+ case clang::X86::BI__builtin_ia32_pshufhw256_maskz:
+ case clang::X86::BI__builtin_ia32_pshufhw512_maskz: {
+ const unsigned BID = E->getBuiltinCallee();
- SmallVector<APValue, 32> Out;
- Out.reserve(NumElts);
+ const bool IsMask = BID == clang::X86::BI__builtin_ia32_pshufhw128_mask ||
+ BID == clang::X86::BI__builtin_ia32_pshufhw256_mask ||
+ BID == clang::X86::BI__builtin_ia32_pshufhw512_mask;
- for (unsigned i = 0; i < NumElts; ++i) {
- const unsigned laneBase = (i / LaneElems) * LaneElems;
- const unsigned inLane = i % LaneElems;
+ const bool IsMaskZ = BID == clang::X86::BI__builtin_ia32_pshufhw128_maskz ||
+ BID == clang::X86::BI__builtin_ia32_pshufhw256_maskz ||
+ BID == clang::X86::BI__builtin_ia32_pshufhw512_maskz;
- APValue Chosen;
- if (inLane >= Half) {
- const unsigned pos = inLane - Half;
- const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
- const unsigned srcIdx = laneBase + Half + sel;
- Chosen = AVal.getVectorElt(srcIdx);
- } else {
- Chosen = AVal.getVectorElt(i);
+ const unsigned AIdx = 0, ImmIdx = 1;
+ const unsigned SrcIdx = 2;
+ const unsigned KIdx = IsMaskZ ? 2 : 3;
+
+ APValue AVal, SrcVal;
+ APSInt Imm, K;
+ if (!EvaluateAsRValue(Info, E->getArg(AIdx), AVal))
+ return false;
+ if (!EvaluateInteger(E->getArg(ImmIdx), Imm, Info))
+ return false;
+
+ const APSInt *KPtr = nullptr;
+ const APValue *PassThru = nullptr;
+ bool ZeroInactive = false;
+ if (IsMask) {
+ if (!EvaluateAsRValue(Info, E->getArg(SrcIdx), SrcVal))
+ return false;
+ if (!EvaluateInteger(E->getArg(KIdx), K, Info))
+ return false;
+ KPtr = &K;
+ PassThru = &SrcVal;
+ ZeroInactive = false;
+ } else if (IsMaskZ) {
+ if (!EvaluateInteger(E->getArg(KIdx), K, Info))
+ return false;
+ KPtr = &K;
+ PassThru = nullptr;
+ ZeroInactive = true;
}
- if (KPtr) {
- const bool Keep = (i < KPtr->getBitWidth()) ? (*KPtr)[i] : false;
- if (Keep) {
- Out.push_back(Chosen);
- } else if (ZeroInactive) {
- Out.push_back(MakeZero());
+ const auto *VT = E->getType()->getAs<VectorType>();
+ if (!VT)
+ return false;
+ const unsigned NumElts = VT->getNumElements();
+ const unsigned ElemBits = 16;
+ const unsigned LaneElems = std::min(NumElts, 128u / ElemBits);
+ const unsigned Half = 4;
+ const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+ const bool DestUnsigned =
+ VT->getElementType()->isUnsignedIntegerOrEnumerationType();
+
+ auto MakeZero = [&]() -> APValue {
+ return APValue(APSInt(APInt(ElemBits, 0), DestUnsigned));
+ };
+
+ SmallVector<APValue, 32> Out;
+ Out.reserve(NumElts);
+
+ for (unsigned i = 0; i < NumElts; ++i) {
+ const unsigned laneBase = (i / LaneElems) * LaneElems;
+ const unsigned inLane = i % LaneElems;
+
+ APValue Chosen;
+ if (inLane >= Half) {
+ const unsigned pos = inLane - Half;
+ const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
+ const unsigned srcIdx = laneBase + Half + sel;
+ Chosen = AVal.getVectorElt(srcIdx);
} else {
- const APValue &PT = PassThru ? PassThru->getVectorElt(i)
- : AVal.getVectorElt(i);
- Out.push_back(PT);
+ Chosen = AVal.getVectorElt(i);
}
- } else {
- Out.push_back(Chosen);
- }
- }
- return Success(APValue(Out.data(), Out.size()), E);
-}
-
-case clang::X86::BI__builtin_ia32_pshufd:
-case clang::X86::BI__builtin_ia32_pshufd256:
-case clang::X86::BI__builtin_ia32_pshufd512:
-case clang::X86::BI__builtin_ia32_pshufd128_mask:
-case clang::X86::BI__builtin_ia32_pshufd256_mask:
-case clang::X86::BI__builtin_ia32_pshufd512_mask:
-case clang::X86::BI__builtin_ia32_pshufd128_maskz:
-case clang::X86::BI__builtin_ia32_pshufd256_maskz:
-case clang::X86::BI__builtin_ia32_pshufd512_maskz: {
- const unsigned BID = E->getBuiltinCallee();
-
- const bool IsMask =
- BID == clang::X86::BI__builtin_ia32_pshufd512_mask ||
- BID == clang::X86::BI__builtin_ia32_pshufd128_mask ||
- BID == clang::X86::BI__builtin_ia32_pshufd256_mask;
-
- const bool IsMaskZ =
- BID == clang::X86::BI__builtin_ia32_pshufd512_maskz ||
- BID == clang::X86::BI__builtin_ia32_pshufd128_maskz ||
- BID == clang::X86::BI__builtin_ia32_pshufd256_maskz;
-
- const unsigned AIdx = 0, ImmIdx = 1;
- const unsigned SrcIdx = 2;
- const unsigned KIdx = IsMaskZ ? 2 : 3;
-
- APValue AVal, SrcVal;
- APSInt Imm, K;
- if (!EvaluateAsRValue(Info, E->getArg(AIdx), AVal)) return false;
- if (!EvaluateInteger(E->getArg(ImmIdx), Imm, Info)) return false;
-
- const APSInt *KPtr = nullptr;
- const APValue *PassThru = nullptr;
- bool ZeroInactive = false;
- if (IsMask) {
- if (!EvaluateAsRValue(Info, E->getArg(SrcIdx), SrcVal)) return false;
- if (!EvaluateInteger(E->getArg(KIdx), K, Info)) return false;
- KPtr = &K; PassThru = &SrcVal; ZeroInactive = false;
- } else if (IsMaskZ) {
- if (!EvaluateInteger(E->getArg(KIdx), K, Info)) return false;
- KPtr = &K; PassThru = nullptr; ZeroInactive = true;
- }
-
- const auto *VT = E->getType()->getAs<VectorType>();
- if (!VT) return false;
- const unsigned NumElts = VT->getNumElements();
- const unsigned ElemBits = 32;
- const unsigned LaneElems = std::min(NumElts, 128u / ElemBits);
- const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
- const bool DestUnsigned =
- VT->getElementType()->isUnsignedIntegerOrEnumerationType();
-
- auto MakeZero = [&]() -> APValue {
- return APValue(APSInt(APInt(ElemBits, 0), DestUnsigned));
- };
- SmallVector<APValue, 32> Out;
- Out.reserve(NumElts);
+ if (KPtr) {
+ const bool Keep = (i < KPtr->getBitWidth()) ? (*KPtr)[i] : false;
+ if (Keep) {
+ Out.push_back(Chosen);
+ } else if (ZeroInactive) {
+ Out.push_back(MakeZero());
+ } else {
+ const APValue &PT =
+ PassThru ? PassThru->getVectorElt(i) : AVal.getVectorElt(i);
+ Out.push_back(PT);
+ }
+ } else {
+ Out.push_back(Chosen);
+ }
+ }
+ return Success(APValue(Out.data(), Out.size()), E);
+ }
- for (unsigned i = 0; i < NumElts; ++i) {
- const unsigned laneBase = (i / LaneElems) * LaneElems;
- const unsigned inLane = i % LaneElems;
+ case clang::X86::BI__builtin_ia32_pshufd:
+ case clang::X86::BI__builtin_ia32_pshufd256:
+ case clang::X86::BI__builtin_ia32_pshufd512:
+ case clang::X86::BI__builtin_ia32_pshufd128_mask:
+ case clang::X86::BI__builtin_ia32_pshufd256_mask:
+ case clang::X86::BI__builtin_ia32_pshufd512_mask:
+ case clang::X86::BI__builtin_ia32_pshufd128_maskz:
+ case clang::X86::BI__builtin_ia32_pshufd256_maskz:
+ case clang::X86::BI__builtin_ia32_pshufd512_maskz: {
+ const unsigned BID = E->getBuiltinCallee();
- const unsigned pos = inLane & 3;
- const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
- const unsigned srcIdx = laneBase + sel;
- APValue Chosen = AVal.getVectorElt(srcIdx);
+ const bool IsMask = BID == clang::X86::BI__builtin_ia32_pshufd512_mask ||
+ BID == clang::X86::BI__builtin_ia32_pshufd128_mask ||
+ BID == clang::X86::BI__builtin_ia32_pshufd256_mask;
- if (KPtr) {
- const bool Keep = (i < KPtr->getBitWidth()) ? (*KPtr)[i] : false;
- if (Keep) {
- Out.push_back(Chosen);
- } else if (ZeroInactive) {
- Out.push_back(MakeZero());
+ const bool IsMaskZ = BID == clang::X86::BI__builtin_ia32_pshufd512_maskz ||
+ BID == clang::X86::BI__builtin_ia32_pshufd128_maskz ||
+ BID == clang::X86::BI__builtin_ia32_pshufd256_maskz;
+
+ const unsigned AIdx = 0, ImmIdx = 1;
+ const unsigned SrcIdx = 2;
+ const unsigned KIdx = IsMaskZ ? 2 : 3;
+
+ APValue AVal, SrcVal;
+ APSInt Imm, K;
+ if (!EvaluateAsRValue(Info, E->getArg(AIdx), AVal))
+ return false;
+ if (!EvaluateInteger(E->getArg(ImmIdx), Imm, Info))
+ return false;
+
+ const APSInt *KPtr = nullptr;
+ const APValue *PassThru = nullptr;
+ bool ZeroInactive = false;
+ if (IsMask) {
+ if (!EvaluateAsRValue(Info, E->getArg(SrcIdx), SrcVal))
+ return false;
+ if (!EvaluateInteger(E->getArg(KIdx), K, Info))
+ return false;
+ KPtr = &K;
+ PassThru = &SrcVal;
+ ZeroInactive = false;
+ } else if (IsMaskZ) {
+ if (!EvaluateInteger(E->getArg(KIdx), K, Info))
+ return false;
+ KPtr = &K;
+ PassThru = nullptr;
+ ZeroInactive = true;
+ }
+
+ const auto *VT = E->getType()->getAs<VectorType>();
+ if (!VT)
+ return false;
+ const unsigned NumElts = VT->getNumElements();
+ const unsigned ElemBits = 32;
+ const unsigned LaneElems = std::min(NumElts, 128u / ElemBits);
+ const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+ const bool DestUnsigned =
+ VT->getElementType()->isUnsignedIntegerOrEnumerationType();
+
+ auto MakeZero = [&]() -> APValue {
+ return APValue(APSInt(APInt(ElemBits, 0), DestUnsigned));
+ };
+
+ SmallVector<APValue, 32> Out;
+ Out.reserve(NumElts);
+
+ for (unsigned i = 0; i < NumElts; ++i) {
+ const unsigned laneBase = (i / LaneElems) * LaneElems;
+ const unsigned inLane = i % LaneElems;
+
+ const unsigned pos = inLane & 3;
+ const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
+ const unsigned srcIdx = laneBase + sel;
+ APValue Chosen = AVal.getVectorElt(srcIdx);
+
+ if (KPtr) {
+ const bool Keep = (i < KPtr->getBitWidth()) ? (*KPtr)[i] : false;
+ if (Keep) {
+ Out.push_back(Chosen);
+ } else if (ZeroInactive) {
+ Out.push_back(MakeZero());
+ } else {
+ const APValue &PT =
+ PassThru ? PassThru->getVectorElt(i) : AVal.getVectorElt(i);
+ Out.push_back(PT);
+ }
} else {
- const APValue &PT = PassThru ? PassThru->getVectorElt(i)
- : AVal.getVectorElt(i);
- Out.push_back(PT);
+ Out.push_back(Chosen);
}
- } else {
- Out.push_back(Chosen);
}
+ return Success(APValue(Out.data(), Out.size()), E);
}
- return Success(APValue(Out.data(), Out.size()), E);
-}
case clang::X86::BI__builtin_ia32_vprotbi:
case clang::X86::BI__builtin_ia32_vprotdi:
@@ -12477,7 +12503,6 @@ case clang::X86::BI__builtin_ia32_pshufd512_maskz: {
return Success(APValue(ResultElements.data(), ResultElements.size()), E);
}
-
case X86::BI__builtin_ia32_insertf32x4_256:
case X86::BI__builtin_ia32_inserti32x4_256:
case X86::BI__builtin_ia32_insertf64x2_256:
diff --git a/clang/lib/Headers/mmintrin.h b/clang/lib/Headers/mmintrin.h
index 039f8c5ca..371a8c5ac 100644
--- a/clang/lib/Headers/mmintrin.h
+++ b/clang/lib/Headers/mmintrin.h
@@ -43,7 +43,7 @@ typedef char __v16qi __attribute__((__vector_size__(16)));
__attribute__((__always_inline__, __nodebug__, __target__("sse2"), \
__min_vector_width__(128)))
-#define __DEFAULT_FN_ATTRS_MMX \
+#define __DEFAULT_FN_ATTRS_MMX \
__attribute__((__always_inline__, __nodebug__, __target__("mmx")))
#if defined(__cplusplus) && (__cplusplus >= 201103L)
@@ -51,7 +51,7 @@ typedef char __v16qi __attribute__((__vector_size__(16)));
#define __DEFAULT_FN_ATTRS_MMX_CONSTEXPR __DEFAULT_FN_ATTRS_MMX constexpr
#else
#define __DEFAULT_FN_ATTRS_SSE2_CONSTEXPR __DEFAULT_FN_ATTRS_SSE2
-#define __DEFAULT_FN_ATTRS_MMX_CONSTEXPR __DEFAULT_FN_ATTRS_MMX
+#define __DEFAULT_FN_ATTRS_MMX_CONSTEXPR __DEFAULT_FN_ATTRS_MMX
#endif
#define __trunc64(x) \
@@ -192,7 +192,6 @@ _mm_packs_pi32(__m64 __m1, __m64 __m2) {
(__v4si)__builtin_shufflevector(__m1, __m2, 0, 1), (__v4si){}));
}
-
/// Converts, with saturation, 16-bit signed integers from both 64-bit integer
/// vector parameters of [4 x i16] into 8-bit unsigned integer values, and
/// constructs a 64-bit integer vector of [8 x i8] as the result.
|
[Headers][X86] Allow PSHUFD/PSHUFLW/PSHUFW shuffle intrinsics to be used in
constexprPSHUFW — shuffle 4×i16 in MMX (64-bit)
_mm_shuffle_pi16__builtin_ia32_pshufwmmintrin.hPSHUFLW — shuffle low 4×i16 per 128-bit lane
_mm_shufflelo_epi16__builtin_ia32_pshuflwemmintrin.h_mm256_shufflelo_epi16__builtin_ia32_pshuflw256avx2intrin.h_mm512_shufflelo_epi16__builtin_ia32_pshuflw512avx512bwintrin.h_mm_mask_shufflelo_epi16__builtin_ia32_pshuflw128_maskavx512vlbwintrin.h_mm256_mask_shufflelo_epi16__builtin_ia32_pshuflw256_maskavx512vlbwintrin.h_mm512_mask_shufflelo_epi16__builtin_ia32_pshuflw512_maskavx512bwintrin.h_mm_maskz_shufflelo_epi16__builtin_ia32_pshuflw128_maskzavx512vlbwintrin.h_mm256_maskz_shufflelo_epi16__builtin_ia32_pshuflw256_maskzavx512vlbwintrin.h_mm512_maskz_shufflelo_epi16__builtin_ia32_pshuflw512_maskzavx512bwintrin.hPSHUFHW — shuffle high 4×i16 per 128-bit lane
_mm_shufflehi_epi16__builtin_ia32_pshufhwemmintrin.h_mm256_shufflehi_epi16__builtin_ia32_pshufhw256avx2intrin.h_mm512_shufflehi_epi16__builtin_ia32_pshufhw512avx512bwintrin.h_mm_mask_shufflehi_epi16__builtin_ia32_pshufhw128_maskavx512vlbwintrin.h_mm256_mask_shufflehi_epi16__builtin_ia32_pshufhw256_maskavx512vlbwintrin.h_mm512_mask_shufflehi_epi16__builtin_ia32_pshufhw512_maskavx512bwintrin.h_mm_maskz_shufflehi_epi16__builtin_ia32_pshufhw128_maskzavx512vlbwintrin.h_mm256_maskz_shufflehi_epi16__builtin_ia32_pshufhw256_maskzavx512vlbwintrin.h_mm512_maskz_shufflehi_epi16__builtin_ia32_pshufhw512_maskzavx512bwintrin.hPSHUFD — shuffle 4×i32 per 128-bit lane
_mm_shuffle_epi32__builtin_ia32_pshufdemmintrin.h_mm256_shuffle_epi32__builtin_ia32_pshufd256avx2intrin.h_mm512_shuffle_epi32__builtin_ia32_pshufd512avx512fintrin.h_mm_mask_shuffle_epi32__builtin_ia32_pshufd128_maskavx512vlintrin.h_mm256_mask_shuffle_epi32__builtin_ia32_pshufd256_maskavx512vlintrin.h_mm512_mask_shuffle_epi32__builtin_ia32_pshufd512_maskavx512fintrin.h_mm_maskz_shuffle_epi32__builtin_ia32_pshufd128_maskzavx512vlintrin.h_mm256_maskz_shuffle_epi32__builtin_ia32_pshufd256_maskzavx512vlintrin.h_mm512_maskz_shuffle_epi32__builtin_ia32_pshufd512_maskzavx512fintrin.hFixes #156611
Adds constexpr evaluation to these intrinsics in both the ExprConstant evaluator and the Bytecode Interpreter, with tests for all unmasked, masked, and mask-zero variants across MMX, 128-bit, 256-bit, and 512-bit widths.