Skip to content

Conversation

kimyounhoex1
Copy link

These X86 builtins (__builtin_ia32_pslldqi128/256_byteshift and
__builtin_ia32_psrldqi128/256_byteshift) can now be evaluated at
compile time when the shift amount is a constant integer.

This improves consistency with other vector shift intrinsics and
reduces unnecessary runtime evaluation.

Fixes #156494

@github-actions
Copy link

github-actions bot commented Sep 8, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 8, 2025

@llvm/pr-subscribers-clang

Author: None (kimyounhoex1)

Changes

These X86 builtins (__builtin_ia32_pslldqi128/256_byteshift and
__builtin_ia32_psrldqi128/256_byteshift) can now be evaluated at
compile time when the shift amount is a constant integer.

This improves consistency with other vector shift intrinsics and
reduces unnecessary runtime evaluation.

Fixes #156494


Full diff: https://github.com/llvm/llvm-project/pull/157403.diff

1 Files Affected:

  • (modified) clang/lib/AST/ExprConstant.cpp (+65)
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index b4f1e76187e25..2b06705a4870c 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -12039,6 +12039,71 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
     }
     return Success(APValue(ResultElements.data(), ResultElements.size()), E);
   }
+  case X86::BI__builtin_ia32_pslldqi128_byteshift:
+  case X86::BI__builtin_ia32_psrldqi128_byteshift: {
+    unsigned BuiltinID = E->getBuiltinCallee();
+
+    APSInt Amt;
+    if (!EvaluateInteger(E->getArg(1), Amt, Info))
+      break;
+    unsigned Shift = (unsigned)Amt.getZExtValue();
+
+    APValue Vec;
+    if (!Evaluate(Vec, Info, E->getArg(0)) || !Vec.isVector())
+      break;
+
+    SmallVector<APValue, 16> ResultElements;
+    ResultElements.reserve(16);
+
+    bool isLeft = (BuiltinID == X86::BI__builtin_ia32_pslldqi128_byteshift);
+
+    for (unsigned i = 0; i < 16; i++) {
+      int SrcIdx = -1;
+      if (isLeft)
+        SrcIdx = i + Shift;
+      else if (i >= Shift)
+        SrcIdx = i - Shift;
+
+      if (SrcIdx >= 0 && (unsigned)SrcIdx < 16)
+        ResultElements.push_back(Vec.getVectorElt(SrcIdx));
+      else
+        ResultElements.push_back(APValue(0));
+    }
+    return Success(APValue(ResultElements.data(), ResultElements.size()), E);
+  }
+
+  case X86::BI__builtin_ia32_pslldqi256_byteshift:
+  case X86::BI__builtin_ia32_psrldqi256_byteshift: {
+    unsigned BuiltinID = E->getBuiltinCallee();
+
+    APSInt Amt;
+    if (!EvaluateInteger(E->getArg(1), Amt, Info))
+      break;
+    unsigned Shift = (unsigned)Amt.getZExtValue();
+
+    APValue Vec;
+    if (!Evaluate(Vec, Info, E->getArg(0)) || !Vec.isVector())
+      break;
+
+    SmallVector<APValue, 32> ResultElements;
+    ResultElements.reserve(32);
+
+    bool isLeft = (BuiltinID == X86::BI__builtin_ia32_pslldqi256_byteshift);
+
+    for (unsigned i = 0; i < 32; i++) {
+      int SrcIdx = -1;
+      if (isLeft)
+        SrcIdx = i + Shift;
+      else if (i >= Shift)
+        SrcIdx = i - Shift;
+
+      if (SrcIdx >= 0 && (unsigned)SrcIdx < 32)
+        ResultElements.push_back(Vec.getVectorElt(SrcIdx));
+      else
+        ResultElements.push_back(APValue(0));
+    }
+    return Success(APValue(ResultElements.data(), ResultElements.size()), E);
+  }
   }
 }
 

@kimyounhoex1
Copy link
Author

Hi @RKSimon, just kindly pinging this PR
The workflows are pending approval and the required build checks haven’t started yet.
Could a maintainer please take a look? Thanks a lot!

@RKSimon RKSimon self-requested a review September 10, 2025 14:07
@RKSimon
Copy link
Collaborator

RKSimon commented Sep 10, 2025

Hi @RKSimon, just kindly pinging this PR The workflows are pending approval and the required build checks haven’t started yet. Could a maintainer please take a look? Thanks a lot!

Will take a look - sorry I missed this (unfortunately github is terrible with notifications for contributions from non project members) - CC'ing me is the best way so it appears in my github mentioned tab

@kimyounhoex1
Copy link
Author

Hi @RKSimon, just kindly pinging this PR The workflows are pending approval and the required build checks haven’t started yet. Could a maintainer please take a look? Thanks a lot!

Will take a look - sorry I missed this (unfortunately github is terrible with notifications for contributions from non project members) - CC'ing me is the best way so it appears in my github mentioned tab

Understood, I’ll keep that in mind next time.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test coverage? they need adding to sse2-builtins.c and avx2-builtins.c

SrcIdx = i - Shift;

if (SrcIdx >= 0 && (unsigned)SrcIdx < 16)
ResultElements.push_back(Vec.getVectorElt(SrcIdx));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to work as currently the intrinsics take <X x long long int> types - we're going to have to change these to <8X x char> types to make this a lot easier to deal with - see the palignr builtins for an example

Copy link
Author

@kimyounhoex1 kimyounhoex1 Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m done with this one, could you please check one more?

@github-actions
Copy link

github-actions bot commented Sep 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@RKSimon RKSimon changed the title [clang] VectorExprEvaluator::VisitCallExpr - add constant folding for X86 pslldqi/psrldqi infrinsics [clang] VectorExprEvaluator::VisitCallExpr - add constant folding for X86 pslldqi/psrldqi intrinsics Sep 15, 2025
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to add InterpBuiltin handling, declare the byteshift builtins as Constexpr in BuiltinsX86.td, add avx512bw-builtins.c test coverage and add tests that check the 'ShiftVal > 15 case).

// i should emplement SLLDQ, SRLDQ shift (intrinsics) in constant expression
// handling inside this function
// avx2intrin.h -> _mm256_slli_si256
// emmintrin.h -> _mm_slli_si128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this is looking correct now, please can you resolve the merge conflict and add equivalent handling to InterpBuiltin.cpp (we currently have 2 constexpr handlers so this needs to be duplicated).

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed an issue with the builtin attributes


let Features = "sse2", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
let Features = "sse2",
Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can only add the constexpr attribute to the byteshift intrinsics - better to move the byteshift declarations into the constexor below (psrldi128 et al.)


let Features = "avx2", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in {
let Features = "avx2",
Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<256>] in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same - undo this and move the byteshift declarations into the block with psrlqi256 etc.


let Features = "avx512bw", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
let Features = "avx512bw",
Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the byteshfts down to the block containing pmulhw512?


let Features = "sse2", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
let Features = "sse2",
Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) revert changes from code unrelated to the patch (this includes formatting)

def pslldqi128_byteshift : X86Builtin<"_Vector<16, char>(_Vector<16, char>, _Constant int)">;
def psrldqi128_byteshift : X86Builtin<"_Vector<16, char>(_Vector<16, char>, _Constant int)">;
def pmaddwd128
: X86Builtin<"_Vector<4, int>(_Vector<8, short>, _Vector<8, short>)">;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) revert changes from code unrelated to the patch (this includes formatting)


let Features = "avx2", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in {
let Features = "avx2",
Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) revert changes from code unrelated to the patch (this includes formatting)

def psllw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<8, short>)">;
def pslldqi256_byteshift : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Constant int)">;
def psllw256
: X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<8, short>)">;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) revert changes from code unrelated to the patch (this includes formatting)

def psrad256 : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Vector<4, int>)">;
def psrldqi256_byteshift : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Constant int)">;
def psrad256
: X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Vector<4, int>)">;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) revert changes from code unrelated to the patch (this includes formatting)


let Features = "avx512bw", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
let Features = "avx512bw",
Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) revert changes from code unrelated to the patch (this includes formatting)

S, OpPC, Call, [](const APSInt &LHS, const APSInt &RHS) {
unsigned ShiftAmt = RHS.getZExtValue();
return LHS.shl(ShiftAmt * 8);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to work as SLLDQ/SRLDQ aren't per-element instructions - they need to handled like shuffles like you have in ExprConstant.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor

}

static bool interp__builtin_byteshift(
InterpState & S, CodePtr OpPC, const CallExpr *Call, uint32_t BuiltinID) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add bool IsLeft arg and then split calls below


case clang::X86::BI__builtin_ia32_pslldqi128:
case clang::X86::BI__builtin_ia32_pslldqi256:
case clang::X86::BI__builtin_ia32_pslldqi512:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return interp__builtin_byteshift(S, OpPC, Call, BuiltinID(), /*IsLeft=*/true);

case clang::X86::BI__builtin_ia32_psrldqi128:
case clang::X86::BI__builtin_ia32_psrldqi256:
case clang::X86::BI__builtin_ia32_psrldqi512:
return interp__builtin_byteshift(S, OpPC, Call, BuiltinID);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return interp__builtin_byteshift(S, OpPC, Call, BuiltinID(), /*IsLeft=*/false);

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't merge with trunk - please can you resolve

}

return Success(APValue(Result.data(), Result.size()), E);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing brace?


for (unsigned LaneBase = 0; LaneBase < NumElts; LaneBase += LaneBytes) {
for (unsigned I = 0; I < LaneBytes; ++I) {
int src = IsLeft ? (I + ShiftVal) : (int)I - (int)ShiftVal;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) int Src =

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still broken - have you actually tested this? ninja check-clang-codegen-x86

}

TEST_CONSTEXPR(match_v16qi(_mm_srli_si128((__m128i)(__v16qi){1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}, 3), 4,5,6,7,8,9,10,11,12,13,14,15,16,0,0,0))
TEST_CONSTEXPR(match_v16qi(_mm_srli_si128((__m128i)(__v16qi){1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}, 16), 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
Copy link
Collaborator

@RKSimon RKSimon Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALL of these tests are still failing as you're missing brackets around the vector initialisations (because _mm_srli_si128 is a macro it expands early).

TEST_CONSTEXPR(match_v16qi(_mm_srli_si128(((__m128i)(__v16qi){1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}), 16), 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))

Update all your other TEST_CONSTEXPR accordingly

const Pointer &VecPtr = S.Stk.pop<Pointer>();
const Pointer &Dst = S.Stk.peek<Pointer>();

unsigned NumElts = VecPtr.getNumElems();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned NumElts = VecPtr.getNumElems();
unsigned NumElems = VecPtr.getNumElems();


static bool interp__builtin_byteshift(InterpState &S, CodePtr OpPC,
const CallExpr *Call, uint32_t BuiltinID,
bool isLeft) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool isLeft) {
bool IsLeft) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to do it on my previous PC, but since I switched to a new one, I’ve been having trouble building due to storage limitations, so I couldn’t proceed. Sorry about that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry if I’m causing difficulties in your progress.

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 6, 2025

@kimyounhoex1 Are you able to address the remaining feedback soon please?

@kimyounhoex1
Copy link
Author

Sorry for the delay! I'll fix the remaining issues and push an update soon.

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 10, 2025

@kimyounhoex1 are you able to get this finished soon please? the patch is starting to bit rot due to all the other constexpr work

@kimyounhoex1
Copy link
Author

@RKSimon
Hello Simon,

I’m Younho Kim, the contributor who worked on PR #157403.

Through this issue, I realized how challenging it is to contribute to such a large and sophisticated project like LLVM. I studied hard to understand the codebase, but I also learned that I still have a long way to go.

I truly appreciate your patience and continuous feedback over the past few weeks. Despite your help and kind guidance, I feel that my current skills are not yet sufficient to finalize this patch properly. I really wanted to finish it, but I’m afraid I’ve already taken too much of your time.

Thank you very much for giving me the opportunity to participate and learn from this process. I’m very sorry that I couldn’t meet your expectations this time, but I’ll keep learning and hopefully come back stronger to contribute again in the future.

I wanted to send this message by email, but since I couldn’t find a proper way to reach you, I’m leaving it here instead.

Best regards,
Younho Kim

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 14, 2025

@kimyounhoex1 Understood - I'll try to finish this patch and make sure you get mentioned - thanks for trying!

@RKSimon RKSimon closed this Oct 14, 2025
@kimyounhoex1
Copy link
Author

Thank you so much for letting me take part and learn from this. I really appreciate your guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category