[X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr. #161210

NagrajMG · 2025-09-29T14:44:59Z

[Headers][X86] Allow PSHUFD/PSHUFLW/PSHUFW shuffle intrinsics to be used in `constexpr`

The i16/i32 shuffle intrinsics (pshufw, pshuflw, pshufhw, pshufd) currently cannot be used in constant expressions. This patch adds support in both bytecode interpreter (InterpBuiltin.cpp) and constant evaluator
(ExprConstant.cpp) for pshuf intrinsics, enabling their use in constant expressions.

Intrinsics covered

_mm_shuffle_pi16 (MMX pshufw)
_mm_shufflelo_epi16 / _mm_shufflehi_epi16
_mm_shuffle_epi32
Their AVX2/AVX512 vector-width variants
Masked and maskz forms (handled indirectly via __builtin_ia32_select*)

Fixes #156611

github-actions · 2025-09-29T14:45:21Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-09-29T14:45:53Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: Nagraj Gaonkar (NagrajMG)

Changes

[Headers][X86] Allow PSHUFD/PSHUFLW/PSHUFW shuffle intrinsics to be used in `constexpr`

PSHUFW — shuffle 4×i16 in MMX (64-bit)

Intrinsic	X86 Builtin	CPUID Flags	Header
`_mm_shuffle_pi16`	`__builtin_ia32_pshufw`	MMX	`mmintrin.h`

PSHUFLW — shuffle low 4×i16 per 128-bit lane

Intrinsics	X86 Builtins	CPUID Flags	Header
`_mm_shufflelo_epi16`	`__builtin_ia32_pshuflw`	SSE2	`emmintrin.h`
`_mm256_shufflelo_epi16`	`__builtin_ia32_pshuflw256`	AVX2	`avx2intrin.h`
`_mm512_shufflelo_epi16`	`__builtin_ia32_pshuflw512`	AVX-512BW	`avx512bwintrin.h`
`_mm_mask_shufflelo_epi16`	`__builtin_ia32_pshuflw128_mask`	AVX-512VL+BW	`avx512vlbwintrin.h`
`_mm256_mask_shufflelo_epi16`	`__builtin_ia32_pshuflw256_mask`	AVX-512VL+BW	`avx512vlbwintrin.h`
`_mm512_mask_shufflelo_epi16`	`__builtin_ia32_pshuflw512_mask`	AVX-512BW	`avx512bwintrin.h`
`_mm_maskz_shufflelo_epi16`	`__builtin_ia32_pshuflw128_maskz`	AVX-512VL+BW	`avx512vlbwintrin.h`
`_mm256_maskz_shufflelo_epi16`	`__builtin_ia32_pshuflw256_maskz`	AVX-512VL+BW	`avx512vlbwintrin.h`
`_mm512_maskz_shufflelo_epi16`	`__builtin_ia32_pshuflw512_maskz`	AVX-512BW	`avx512bwintrin.h`

PSHUFHW — shuffle high 4×i16 per 128-bit lane

Intrinsics	X86 Builtins	CPUID Flags	Header
`_mm_shufflehi_epi16`	`__builtin_ia32_pshufhw`	SSE2	`emmintrin.h`
`_mm256_shufflehi_epi16`	`__builtin_ia32_pshufhw256`	AVX2	`avx2intrin.h`
`_mm512_shufflehi_epi16`	`__builtin_ia32_pshufhw512`	AVX-512BW	`avx512bwintrin.h`
`_mm_mask_shufflehi_epi16`	`__builtin_ia32_pshufhw128_mask`	AVX-512VL+BW	`avx512vlbwintrin.h`
`_mm256_mask_shufflehi_epi16`	`__builtin_ia32_pshufhw256_mask`	AVX-512VL+BW	`avx512vlbwintrin.h`
`_mm512_mask_shufflehi_epi16`	`__builtin_ia32_pshufhw512_mask`	AVX-512BW	`avx512bwintrin.h`
`_mm_maskz_shufflehi_epi16`	`__builtin_ia32_pshufhw128_maskz`	AVX-512VL+BW	`avx512vlbwintrin.h`
`_mm256_maskz_shufflehi_epi16`	`__builtin_ia32_pshufhw256_maskz`	AVX-512VL+BW	`avx512vlbwintrin.h`
`_mm512_maskz_shufflehi_epi16`	`__builtin_ia32_pshufhw512_maskz`	AVX-512BW	`avx512bwintrin.h`

PSHUFD — shuffle 4×i32 per 128-bit lane

Intrinsics	X86 Builtins	CPUID Flags	Header
`_mm_shuffle_epi32`	`__builtin_ia32_pshufd`	SSE2	`emmintrin.h`
`_mm256_shuffle_epi32`	`__builtin_ia32_pshufd256`	AVX2	`avx2intrin.h`
`_mm512_shuffle_epi32`	`__builtin_ia32_pshufd512`	AVX-512F	`avx512fintrin.h`
`_mm_mask_shuffle_epi32`	`__builtin_ia32_pshufd128_mask`	AVX-512VL	`avx512vlintrin.h`
`_mm256_mask_shuffle_epi32`	`__builtin_ia32_pshufd256_mask`	AVX-512VL	`avx512vlintrin.h`
`_mm512_mask_shuffle_epi32`	`__builtin_ia32_pshufd512_mask`	AVX-512F	`avx512fintrin.h`
`_mm_maskz_shuffle_epi32`	`__builtin_ia32_pshufd128_maskz`	AVX-512VL	`avx512vlintrin.h`
`_mm256_maskz_shuffle_epi32`	`__builtin_ia32_pshufd256_maskz`	AVX-512VL	`avx512vlintrin.h`
`_mm512_maskz_shuffle_epi32`	`__builtin_ia32_pshufd512_maskz`	AVX-512F	`avx512fintrin.h`

Fixes #156611

Adds constexpr evaluation to these intrinsics in both the ExprConstant evaluator and the Bytecode Interpreter, with tests for all unmasked, masked, and mask-zero variants across MMX, 128-bit, 256-bit, and 512-bit widths.

Patch is 57.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161210.diff

11 Files Affected:

(modified) clang/include/clang/Basic/BuiltinsX86.td (+35-10)
(modified) clang/lib/AST/ByteCode/InterpBuiltin.cpp (+245)
(modified) clang/lib/AST/ExprConstant.cpp (+287)
(modified) clang/lib/Headers/mmintrin.h (+5)
(modified) clang/test/CodeGen/X86/avx2-builtins.c (+3-3)
(modified) clang/test/CodeGen/X86/avx512bw-builtins.c (+10-1)
(modified) clang/test/CodeGen/X86/avx512f-builtins.c (+7-2)
(modified) clang/test/CodeGen/X86/avx512vl-builtins.c (+17)
(modified) clang/test/CodeGen/X86/avx512vlbw-builtins.c (+50)
(modified) clang/test/CodeGen/X86/mmx-builtins.c (+1-1)
(modified) clang/test/CodeGen/X86/sse2-builtins.c (+3-3)

diff --git a/clang/include/clang/Basic/BuiltinsX86.td b/clang/include/clang/Basic/BuiltinsX86.td
index 77e599587edc3..e70691a30627a 100644
--- a/clang/include/clang/Basic/BuiltinsX86.td
+++ b/clang/include/clang/Basic/BuiltinsX86.td
@@ -145,6 +145,10 @@ let Features = "mmx", Header = "mmintrin.h", Attributes = [NoThrow, Const] in {
   def _m_prefetch : X86LibBuiltin<"void(void *)">;
 }
 
+let Features = "mmx", Attributes = [NoThrow, Const, Constexpr] in {
+  def pshufw : X86Builtin<"_Vector<4, short>(_Vector<4, short>, _Constant int)">;
+}
+
 // PRFCHW
 let Features = "prfchw", Header = "intrin.h", Attributes = [NoThrow, Const] in {
   def _m_prefetchw : X86LibBuiltin<"void(void volatile const *)">;
@@ -217,10 +221,13 @@ let Features = "sse2", Attributes = [NoThrow] in {
   def movnti : X86Builtin<"void(int *, int)">;
 }
 
-let Features = "sse2", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
-  def pshufd : X86Builtin<"_Vector<4, int>(_Vector<4, int>, _Constant int)">;
+let Features = "sse2", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in {
   def pshuflw : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int)">;
+  def pshufd : X86Builtin<"_Vector<4, int>(_Vector<4, int>, _Constant int)">;
   def pshufhw : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int)">;
+}
+
+let Features = "sse2", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
   def psadbw128 : X86Builtin<"_Vector<2, long long int>(_Vector<16, char>, _Vector<16, char>)">;
   def sqrtpd : X86Builtin<"_Vector<2, double>(_Vector<2, double>)">;
   def sqrtsd : X86Builtin<"_Vector<2, double>(_Vector<2, double>)">;
@@ -584,9 +591,6 @@ let Features = "avx2", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] i
   def pmulhrsw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<16, short>)">;
   def psadbw256 : X86Builtin<"_Vector<4, long long int>(_Vector<32, char>, _Vector<32, char>)">;
   def pshufb256 : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Vector<32, char>)">;
-  def pshufd256 : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Constant int)">;
-  def pshuflw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int)">;
-  def pshufhw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int)">;
   def psignb256 : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Vector<32, char>)">;
   def psignw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<16, short>)">;
   def psignd256 : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Vector<8, int>)">;
@@ -647,6 +651,10 @@ let Features = "avx2", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWi
   def packsswb256 : X86Builtin<"_Vector<32, char>(_Vector<16, short>, _Vector<16, short>)">;
   def packssdw256 : X86Builtin<"_Vector<16, short>(_Vector<8, int>, _Vector<8, int>)">;
   def packuswb256 : X86Builtin<"_Vector<32, char>(_Vector<16, short>, _Vector<16, short>)">;
+
+  def pshuflw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int)">;
+  def pshufhw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int)">;
+  def pshufd256  : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Constant int)">;
 }
 
 let Features = "avx2", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in {
@@ -1990,13 +1998,13 @@ let Features = "avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVect
 }
 
 let Features = "avx512bw", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
-  def pshufhw512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int)">;
-  def pshuflw512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int)">;
   def psllw512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Vector<8, short>)">;
 }
 
 let Features = "avx512bw", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
   def psllv32hi : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Vector<32, short>)">;
+  def pshufhw512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int)">;
+  def pshuflw512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int)">;
 }
 
 let Features = "avx512bw,avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<256>] in {
@@ -2016,21 +2024,35 @@ let Features = "avx512f",
 
 let Features = "avx512bw", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
   def psrlv32hi : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Vector<32, short>)">;
+  def pshuflw512_mask  : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int, _Vector<32, short>, unsigned int)">;
+  def pshuflw512_maskz : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int, unsigned int)">;
+  def pshufhw512_mask  : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int, _Vector<32, short>, unsigned int)">;
+  def pshufhw512_maskz : X86Builtin<"_Vector<32, short>(_Vector<32, short>, _Constant int, unsigned int)">;
 }
 
 let Features = "avx512bw,avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<256>] in {
   def psrlv16hi : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<16, short>)">;
+  def pshuflw256_mask  : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int, _Vector<16, short>, unsigned short)">;
+  def pshuflw256_maskz : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int, unsigned short)">;
+  def pshufhw256_mask  : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int, _Vector<16, short>, unsigned short)">;
+  def pshufhw256_maskz : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Constant int, unsigned short)">;
 }
 
 let Features = "avx512bw,avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in {
   def psrlv8hi : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Vector<8, short>)">;
+  def pshuflw128_mask  : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int, _Vector<8, short>, unsigned char)">;
+  def pshuflw128_maskz : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int, unsigned char)">;
+  def pshufhw128_mask  : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int, _Vector<8, short>, unsigned char)">;
+  def pshufhw128_maskz : X86Builtin<"_Vector<8, short>(_Vector<8, short>, _Constant int, unsigned char)">;
 }
 
-let Features = "avx512f",
-    Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
+let Features = "avx512f", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
   def psrlwi512 : X86Builtin<"_Vector<32, short>(_Vector<32, short>, int)">;
   def psrldi512 : X86Builtin<"_Vector<16, int>(_Vector<16, int>, int)">;
   def psrlqi512 : X86Builtin<"_Vector<8, long long int>(_Vector<8, long long int>, int)">;
+  def pshufd512_mask : X86Builtin<"_Vector<16, int>(_Vector<16, int>, _Constant int, _Vector<16, int>, unsigned short)">;
+  def pshufd512_maskz : X86Builtin<"_Vector<16, int>(_Vector<16, int>, _Constant int, unsigned short)">;
+  def pshufd512 : X86Builtin<"_Vector<16, int>(_Vector<16, int>, _Constant int)">;
 }
 
 let Features = "avx512bw", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
@@ -2047,10 +2069,14 @@ let Features = "avx512bw,avx512vl", Attributes = [NoThrow, Const, Constexpr, Req
 
 let Features = "avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in {
   def psravq128 : X86Builtin<"_Vector<2, long long int>(_Vector<2, long long int>, _Vector<2, long long int>)">;
+  def pshufd128_mask : X86Builtin<"_Vector<4, int>(_Vector<4, int>, _Constant int, _Vector<4, int>, unsigned char)">;
+  def pshufd128_maskz : X86Builtin<"_Vector<4, int>(_Vector<4, int>, _Constant int, unsigned char)">;
 }
 
 let Features = "avx512vl", Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<256>] in {
   def psravq256 : X86Builtin<"_Vector<4, long long int>(_Vector<4, long long int>, _Vector<4, long long int>)">;
+  def pshufd256_mask : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Constant int, _Vector<8, int>, unsigned char)">;
+  def pshufd256_maskz : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Constant int, unsigned char)">;
 }
 
 let Features = "avx512bw", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
@@ -3266,7 +3292,6 @@ let Features = "avx512f", Attributes = [NoThrow, Const, RequiredVectorWidth<128>
 }
 
 let Features = "avx512f", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
-  def pshufd512 : X86Builtin<"_Vector<16, int>(_Vector<16, int>, _Constant int)">;
   def expanddf512_mask : X86Builtin<"_Vector<8, double>(_Vector<8, double>, _Vector<8, double>, unsigned char)">;
   def expanddi512_mask : X86Builtin<"_Vector<8, long long int>(_Vector<8, long long int>, _Vector<8, long long int>, unsigned char)">;
 }
diff --git a/clang/lib/AST/ByteCode/InterpBuiltin.cpp b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
index 891344d4e6ed0..e0bd5d531db34 100644
--- a/clang/lib/AST/ByteCode/InterpBuiltin.cpp
+++ b/clang/lib/AST/ByteCode/InterpBuiltin.cpp
@@ -2862,6 +2862,218 @@ static bool interp__builtin_blend(InterpState &S, CodePtr OpPC,
   return true;
 }
 
+static bool interp__builtin_ia32_pshuflw_common(InterpState &S, CodePtr OpPC,
+                                                const CallExpr *Call) {
+  const unsigned NumArgs = Call->getNumArgs();
+  assert(NumArgs == 2 || NumArgs == 3 || NumArgs == 4);
+  APSInt K;
+  Pointer SrcPT;
+  const bool HasMask = (NumArgs == 3) || (NumArgs == 4);
+  const bool IsMaskZ = (NumArgs == 3);
+  if (NumArgs == 4) {
+    K = popToAPSInt(S, Call->getArg(3));
+    SrcPT = S.Stk.pop<Pointer>();
+  } else if (NumArgs == 3) {
+    K = popToAPSInt(S, Call->getArg(2));
+  }
+
+  APSInt Imm = popToAPSInt(S, Call->getArg(1));
+  const Pointer &Src = S.Stk.pop<Pointer>();
+  const Pointer &Dst = S.Stk.peek<Pointer>();
+  const unsigned NumElems = Dst.getNumElems();
+  const PrimType ElemT = Dst.getFieldDesc()->getPrimType();
+  const unsigned ElemBits = 16;
+  const unsigned LaneElems = 128u / ElemBits;
+  const unsigned Half = 4;
+  assert(NumElems % LaneElems == 0 && "pshuflw expects 128-bit lanes");
+  const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+
+  for (unsigned i = 0; i != NumElems; ++i) {
+    const unsigned laneBase = (i / LaneElems) * LaneElems;
+    const unsigned inLane = i % LaneElems;
+
+    unsigned srcIdx;
+    if (inLane < Half) {
+      const unsigned pos = inLane;
+      const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
+      srcIdx = laneBase + sel;
+    } else {
+      srcIdx = i;
+    }
+
+    APSInt Chosen;
+    INT_TYPE_SWITCH(ElemT, { Chosen = Src.elem<T>(srcIdx).toAPSInt(); });
+
+    if (!HasMask) {
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+      continue;
+    }
+
+    const bool Keep =
+        (i < static_cast<unsigned>(K.getBitWidth())) ? K[i] : false;
+
+    if (Keep) {
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+    } else if (IsMaskZ) {
+      APSInt Zero(APInt(Chosen.getBitWidth(), 0));
+      Zero.setIsSigned(Chosen.isSigned());
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Zero); });
+    } else {
+      APSInt PT;
+      INT_TYPE_SWITCH(ElemT, { PT = SrcPT.elem<T>(i).toAPSInt(); });
+      INT_TYPE_SWITCH_NO_BOOL(ElemT, { Dst.elem<T>(i) = static_cast<T>(PT); });
+    }
+  }
+
+  Dst.initializeAllElements();
+  return true;
+}
+
+static bool interp__builtin_ia32_pshufhw_common(InterpState &S, CodePtr OpPC,
+                                                const CallExpr *Call) {
+  (void)OpPC;
+  const unsigned NumArgs = Call->getNumArgs();
+  assert(NumArgs == 2 || NumArgs == 3 || NumArgs == 4);
+
+  APSInt K;
+  Pointer SrcPT;
+  const bool HasMask = (NumArgs == 3) || (NumArgs == 4);
+  const bool IsMaskZ = (NumArgs == 3);
+
+  if (NumArgs == 4) {
+    K = popToAPSInt(S, Call->getArg(3));
+    SrcPT = S.Stk.pop<Pointer>();
+  } else if (NumArgs == 3) {
+    K = popToAPSInt(S, Call->getArg(2));
+  }
+
+  APSInt Imm = popToAPSInt(S, Call->getArg(1));
+  const Pointer &Src = S.Stk.pop<Pointer>();
+  const Pointer &Dst = S.Stk.peek<Pointer>();
+
+  const unsigned NumElems = Dst.getNumElems();
+  const PrimType ElemT = Dst.getFieldDesc()->getPrimType();
+
+  const unsigned ElemBits = 16;
+  const unsigned LaneElems = 128u / ElemBits;
+  const unsigned HalfBase = 4;
+  assert(NumElems % LaneElems == 0);
+
+  const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+
+  for (unsigned i = 0; i != NumElems; ++i) {
+    const unsigned laneBase = (i / LaneElems) * LaneElems;
+    const unsigned inLane = i % LaneElems;
+
+    unsigned srcIdx;
+    if (inLane >= HalfBase) {
+      const unsigned pos = inLane - HalfBase;
+      const unsigned sel = (Ctl >> (2 * pos)) & 0x3;
+      srcIdx = laneBase + HalfBase + sel;
+    } else {
+      srcIdx = i;
+    }
+
+    APSInt Chosen;
+    INT_TYPE_SWITCH(ElemT, { Chosen = Src.elem<T>(srcIdx).toAPSInt(); });
+
+    if (!HasMask) {
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+      continue;
+    }
+
+    const bool Keep =
+        (i < static_cast<unsigned>(K.getBitWidth())) ? K[i] : false;
+    if (Keep) {
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+    } else if (IsMaskZ) {
+      APSInt Zero(APInt(Chosen.getBitWidth(), 0));
+      Zero.setIsSigned(Chosen.isSigned());
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Zero); });
+    } else {
+      APSInt PT;
+      INT_TYPE_SWITCH(ElemT, { PT = SrcPT.elem<T>(i).toAPSInt(); });
+      INT_TYPE_SWITCH_NO_BOOL(ElemT, { Dst.elem<T>(i) = static_cast<T>(PT); });
+    }
+  }
+
+  Dst.initializeAllElements();
+  return true;
+}
+
+static bool interp__builtin_ia32_pshufd_common(InterpState &S, CodePtr OpPC,
+                                              const CallExpr *Call) {
+  (void)OpPC;
+  const unsigned NumArgs = Call->getNumArgs();
+  assert(NumArgs == 2 || NumArgs == 3 || NumArgs == 4);
+
+  APSInt K;
+  Pointer SrcPT;
+  const bool HasMask = (NumArgs == 3) || (NumArgs == 4);
+  const bool IsMaskZ = (NumArgs == 3);
+
+  if (NumArgs == 4) {
+    K = popToAPSInt(S, Call->getArg(3));
+    SrcPT = S.Stk.pop<Pointer>();
+  } else if (NumArgs == 3) {
+    K = popToAPSInt(S, Call->getArg(2));
+  }
+
+  APSInt Imm = popToAPSInt(S, Call->getArg(1));
+  const Pointer &Src = S.Stk.pop<Pointer>();
+  const Pointer &Dst = S.Stk.peek<Pointer>();
+
+  const unsigned NumElems = Dst.getNumElems();
+  const PrimType ElemT = Dst.getFieldDesc()->getPrimType();
+
+  const unsigned ElemBits = 32;
+  const unsigned LaneElems = 128u / ElemBits;
+  assert(NumElems % LaneElems == 0);
+
+  const uint8_t Ctl = static_cast<uint8_t>(Imm.getZExtValue());
+
+  for (unsigned i = 0; i != NumElems; ++i) {
+    const unsigned laneBase = (i / LaneElems) * LaneElems;
+    const unsigned inLane = i % LaneElems;
+    const unsigned sel = (Ctl >> (2 * inLane)) & 0x3;
+    const unsigned srcIdx = laneBase + sel;
+
+    APSInt Chosen;
+    INT_TYPE_SWITCH(ElemT, { Chosen = Src.elem<T>(srcIdx).toAPSInt(); });
+
+    if (!HasMask) {
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+      continue;
+    }
+
+    const bool Keep =
+        (i < static_cast<unsigned>(K.getBitWidth())) ? K[i] : false;
+    if (Keep) {
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Chosen); });
+    } else if (IsMaskZ) {
+      APSInt Zero(APInt(Chosen.getBitWidth(), 0));
+      Zero.setIsSigned(Chosen.isSigned());
+      INT_TYPE_SWITCH_NO_BOOL(ElemT,
+                              { Dst.elem<T>(i) = static_cast<T>(Zero); });
+    } else {
+      APSInt PT;
+      INT_TYPE_SWITCH(ElemT, { PT = SrcPT.elem<T>(i).toAPSInt(); });
+      INT_TYPE_SWITCH_NO_BOOL(ElemT, { Dst.elem<T>(i) = static_cast<T>(PT); });
+    }
+  }
+
+  Dst.initializeAllElements();
+  return true;
+}
+
 static bool interp__builtin_elementwise_triop(
     InterpState &S, CodePtr OpPC, const CallExpr *Call,
     llvm::function_ref<APInt(const APSInt &, const APSInt &, const APSInt &)>
@@ -3417,6 +3629,39 @@ bool InterpretBuiltin(InterpState &S, CodePtr OpPC, const CallExpr *Call,
     return interp__builtin_elementwise_int_binop(S, OpPC, Call,
                                                  llvm::APIntOps::mulhs);
 
+  case clang::X86::BI__builtin_ia32_pshuflw:
+  case clang::X86::BI__builtin_ia32_pshuflw256:
+  case clang::X86::BI__builtin_ia32_pshuflw512:
+  case clang::X86::BI__builtin_ia32_pshuflw128_mask:
+  case clang::X86::BI__builtin_ia32_pshuflw256_mask:
+  case clang::X86::BI__builtin_ia32_pshuflw512_mask:
+  case clang::X86::BI__builtin_ia32_pshuflw128_maskz:
+  case clang::X86::BI__builtin_ia32_pshuflw256_maskz:
+  case clang::X86::BI__builtin_ia32_pshuflw512_maskz:
+    return interp__builtin_ia32_pshuflw_common(S, OpPC, Call);
+
+  case clang::X86::BI__builtin_ia32_pshufhw:
+  case clang::X86::BI__builtin_ia32_pshufhw256:
+  case clang::X86::BI__builtin_ia32_pshufhw512:
+  case clang::X86::BI__builtin_ia32_pshufhw128_mask:
+  case clang::X86::BI__builtin_ia32_pshufhw256_mask:
+  case clang::X86::BI__builtin_ia32_pshufhw512_mask:
+  case clang::X86::BI__builtin_ia32_pshufhw128_maskz:
+  case clang::X86::BI__builtin_ia32_pshufhw256_maskz:
+  case clang::X86::BI__builtin_ia32_pshufhw512_maskz:
+    return interp__builtin_ia32_pshufhw_common(S, OpPC, Call);
+
+  case clang::X86::BI__builtin_ia32_pshufd:
+  case clang::X86::BI__builtin_ia32_pshufd256:
+  case clang::X86::BI__builtin_ia32_pshufd512:
+  case clang::X86::BI__builtin_ia32_pshufd128_mask:
+  case clang::X86::BI__builtin_ia32_pshufd256_mask:
+  case clang::X86::BI__builtin_ia32_pshufd512_mask:
+  case clang::X86::BI__builtin_ia32_pshufd128_maskz:
+  case clang::X86::BI__builtin_ia32_pshufd256_maskz:
+  case clang::X86::BI__builtin_ia32_pshufd512_maskz:
+    return interp__builtin_ia32_pshufd_common(S, OpPC, Call);
+
   case clang::X86::BI__builtin_ia32_psllv2di:
   case clang::X86::BI__builtin_ia32_psllv4di:
   case clang::X86::BI__builtin_ia32_psllv4si:
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index b706b14945b6d..1ce601d37e0d6 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -11869,6 +11869,293 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
     return Success(APValue(ResultElements.data(), ResultElements.size()), E);
   }
 
+  case X86::BI__builtin_ia32_pshufw: {
+    APValue Src;
+    APSInt Imm;
+    if (!EvaluateAsRValue(Info, E->getArg(0), Src)) return false;
+    if (!EvaluateInteger(E->getArg(1), Imm, Info))  return false;
+
+    unsigned N = Src.getVectorLength(); 
+    SmallVector<APValue, 4> ResultElements;
+    ResultElements.reserve(N);
+
+    uint8_t C = static_cast<uint8_t>(Imm.getZExtValue());
+    for (unsigned i = 0; i != N; ++i) {
+      unsigned sel = (C >> (2 * i)) & 0x3;
+      ResultElements.push_back(Src.getVectorElt(sel));
+    }
+    return Success(APValue(ResultElements.data(), ResultElements.size()), E);
+  }
+
+  case clang::X86::BI__builtin_ia32_pshuflw:
+  case clang::X86::BI__builtin_ia32_pshuflw256:
+  case clang::X86::BI__builtin_ia32_pshuflw512:
+  case clang::X86::BI__builtin_ia32_pshuflw128_mask:
+  case clang::X86::BI__builtin_ia32_pshuflw256_mask:
+  case clang::X86::BI__builtin_ia32_pshuflw512_mask:
+  case clang::X86::BI__builtin_ia32_pshuflw128_maskz:
+  case clang::X86::BI__builtin_ia32_pshuflw256_maskz:
+  case clang::X86::BI__builtin_ia32_pshuflw512_maskz: {
+    const unsigned BID = E->getBuiltinCallee();
+
+    const bool IsMask =
+        BID == clang::X86::BI__builtin_ia32_pshuflw128_mask  ||
+        BID == clang::X86::BI__builtin_ia32_pshuflw256_mask  ||
+        BID == clang::X86::BI__builtin_ia32_pshuflw512_mask;
+
+    const bool IsMaskZ =
+        BID == clang::X86::BI__builtin_ia32_pshuflw128_maskz ||
+        BID == clang::X86::BI__builtin_ia32_pshuflw256_maskz ||
+        BID == clang::X86::BI__builtin_ia32_pshuflw512_maskz;
+
+    const unsigned AIdx  = 0, ImmIdx = 1;
+    const unsigned SrcIdx = 2;
+    const unsigned KI...
[truncated]

RKSimon

why did you make all those changes to the builtins? There should be no need for mask/maskz builtins as these are handled by a select wrapper.

clang/include/clang/Basic/BuiltinsX86.td

clang/lib/Headers/mmintrin.h

clang/lib/AST/ByteCode/InterpBuiltin.cpp

NagrajMG · 2025-09-29T15:30:36Z

@RKSimon Got it, I will correct the procedure.
Thanks you!

NagrajMG · 2025-09-30T14:56:14Z

@RKSimon @tbaederr I have made some changes as suggested. Please review it.

RKSimon · 2025-09-30T16:49:11Z

@NagrajMG Are you able to use the "request review" icon next to my reviewer name on the PR? I'm not sure if that's members only like adding reviewers, but it's useful as it adds the PR back to my review list. Cheers.

NagrajMG · 2025-09-30T16:58:26Z

@RKSimon I had clicked on that option yesterday, it reads 'awaiting requested review'.

clang/lib/AST/ByteCode/InterpBuiltin.cpp

github-actions · 2025-10-01T11:21:11Z

✅ With the latest revision this PR passed the C/C++ code formatter.

clang/lib/AST/ByteCode/InterpBuiltin.cpp

Refactor lane element calculation for clarity.

NagrajMG · 2025-10-02T09:26:40Z

@RKSimon The test and the builtin def files undergo massive structural change if I run formatter on them?
I need to ignore them right?

RKSimon

Thanks for this - the shuffle decode looks like it can be simplified quite a bit

clang/lib/AST/ByteCode/InterpBuiltin.cpp

clang/lib/AST/ExprConstant.cpp

Refactor selection logic for clarity and efficiency.

NagrajMG · 2025-10-03T13:30:24Z

@RKSimon I have simplified the code, please review it.

RKSimon

LGTM - cheers

NagrajMG · 2025-10-03T14:14:32Z

@RKSimon It seems the code_formatter is failing because of this invisible whitespaces.

RKSimon · 2025-10-03T14:16:05Z

please can you push a fix?

Removed unnecessary blank line in the bytecode interpreter.

github-actions · 2025-10-03T14:52:43Z

@NagrajMG Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

NagrajMG · 2025-10-03T14:56:16Z

@RKSimon Thank you for your patience! This was my first experience working outside of my personal projects.

FIxes llvm#156611: Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr

633a986

llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang:bytecode Issues for the clang bytecode constexpr interpreter labels Sep 29, 2025

RKSimon changed the title ~~FIxes #156611: Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr.~~ [X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr. Sep 29, 2025

tbaederr requested a review from RKSimon September 29, 2025 14:55

RKSimon requested changes Sep 29, 2025

View reviewed changes

tbaederr reviewed Sep 29, 2025

View reviewed changes

RKSimon requested changes Sep 29, 2025

View reviewed changes

clang/lib/AST/ByteCode/InterpBuiltin.cpp Outdated Show resolved Hide resolved

NagrajMG marked this pull request as draft September 29, 2025 15:28

[X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr

de765b0

NagrajMG marked this pull request as ready for review September 29, 2025 19:35

NagrajMG requested a review from RKSimon September 29, 2025 19:39

[X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr

ba1f202

NagrajMG requested a review from tbaederr September 29, 2025 20:23

tbaederr reviewed Sep 30, 2025

View reviewed changes

clang/lib/AST/ByteCode/InterpBuiltin.cpp Outdated Show resolved Hide resolved

[X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr

4ccbbbc

NagrajMG requested a review from tbaederr September 30, 2025 19:03

Merge branch 'main' into pshuf-intrinsics

00ba23c

NagrajMG added 2 commits October 1, 2025 22:40

Refactor interp__builtin_ia32_pshuf for readability

1d54eeb

Refactor evalPshufBuiltin for clarity and consistency

ffa0287

RKSimon reviewed Oct 1, 2025

View reviewed changes

clang/lib/AST/ByteCode/InterpBuiltin.cpp Outdated Show resolved Hide resolved

NagrajMG added 3 commits October 1, 2025 23:56

Simplify lane element calculation logic

dc67248

Refactor lane element calculation for clarity.

Simplify calculation of LaneElts in ExprConstant.cpp

cb30a32

[X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr

e83fad7

NagrajMG requested a review from RKSimon October 1, 2025 19:23

NagrajMG added 2 commits October 2, 2025 14:49

Update InterpBuiltin.cpp

8588fed

Fix formatting of LaneBits declaration

37c6da9

RKSimon requested changes Oct 2, 2025

View reviewed changes

NagrajMG added 5 commits October 2, 2025 19:43

Simplified interp__builtin_ia32_pshuf function

f89422e

Refactor selection logic in InterpBuiltin.cpp

5b73c1e

Refactor selection logic for clarity and efficiency.

Refactor evalPshufBuiltin parameters for clarity

689994a

Hoisted selection variable declaration in InterpBuiltin.cpp

a6b108e

Hoisted selection variable declaration in ExprConstant

6ea8e94

NagrajMG requested a review from RKSimon October 2, 2025 15:11

Merge branch 'main' into pshuf-intrinsics

ba85f19

RKSimon approved these changes Oct 3, 2025

View reviewed changes

RKSimon enabled auto-merge (squash) October 3, 2025 14:05

RKSimon disabled auto-merge October 3, 2025 14:15

Remove blank line in InterpBuiltin.cpp

34bc66d

Removed unnecessary blank line in the bytecode interpreter.

RKSimon merged commit 952b123 into llvm:main Oct 3, 2025
7 of 8 checks passed

NagrajMG deleted the pshuf-intrinsics branch October 7, 2025 20:56

[X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr. #161210

[X86] Allow PSHUFD/PSHUFLW/PSHUFW intrinsics in constexpr. #161210

Uh oh!

Conversation

NagrajMG commented Sep 29, 2025 • edited by RKSimon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[Headers][X86] Allow PSHUFD/PSHUFLW/PSHUFW shuffle intrinsics to be used in constexpr

Intrinsics covered

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

llvmbot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[Headers][X86] Allow PSHUFD/PSHUFLW/PSHUFW shuffle intrinsics to be used in constexpr

PSHUFW — shuffle 4×i16 in MMX (64-bit)

PSHUFLW — shuffle low 4×i16 per 128-bit lane

PSHUFHW — shuffle high 4×i16 per 128-bit lane

PSHUFD — shuffle 4×i32 per 128-bit lane

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NagrajMG commented Sep 29, 2025

Uh oh!

NagrajMG commented Sep 30, 2025

Uh oh!

RKSimon commented Sep 30, 2025

Uh oh!

NagrajMG commented Sep 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

NagrajMG commented Oct 2, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NagrajMG commented Oct 3, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

NagrajMG commented Oct 3, 2025

Uh oh!

RKSimon commented Oct 3, 2025

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

NagrajMG commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NagrajMG commented Sep 29, 2025 •

edited by RKSimon

Loading

[Headers][X86] Allow PSHUFD/PSHUFLW/PSHUFW shuffle intrinsics to be used in `constexpr`

llvmbot commented Sep 29, 2025 •

edited

Loading

[Headers][X86] Allow PSHUFD/PSHUFLW/PSHUFW shuffle intrinsics to be used in `constexpr`

github-actions bot commented Oct 1, 2025 •

edited

Loading