-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[clang] VectorExprEvaluator::VisitCallExpr - add constant folding for X86 pslldqi/psrldqi intrinsics #157403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang] VectorExprEvaluator::VisitCallExpr - add constant folding for X86 pslldqi/psrldqi intrinsics #157403
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang Author: None (kimyounhoex1) ChangesThese X86 builtins ( This improves consistency with other vector shift intrinsics and Fixes #156494 Full diff: https://github.com/llvm/llvm-project/pull/157403.diff 1 Files Affected:
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index b4f1e76187e25..2b06705a4870c 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -12039,6 +12039,71 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) {
}
return Success(APValue(ResultElements.data(), ResultElements.size()), E);
}
+ case X86::BI__builtin_ia32_pslldqi128_byteshift:
+ case X86::BI__builtin_ia32_psrldqi128_byteshift: {
+ unsigned BuiltinID = E->getBuiltinCallee();
+
+ APSInt Amt;
+ if (!EvaluateInteger(E->getArg(1), Amt, Info))
+ break;
+ unsigned Shift = (unsigned)Amt.getZExtValue();
+
+ APValue Vec;
+ if (!Evaluate(Vec, Info, E->getArg(0)) || !Vec.isVector())
+ break;
+
+ SmallVector<APValue, 16> ResultElements;
+ ResultElements.reserve(16);
+
+ bool isLeft = (BuiltinID == X86::BI__builtin_ia32_pslldqi128_byteshift);
+
+ for (unsigned i = 0; i < 16; i++) {
+ int SrcIdx = -1;
+ if (isLeft)
+ SrcIdx = i + Shift;
+ else if (i >= Shift)
+ SrcIdx = i - Shift;
+
+ if (SrcIdx >= 0 && (unsigned)SrcIdx < 16)
+ ResultElements.push_back(Vec.getVectorElt(SrcIdx));
+ else
+ ResultElements.push_back(APValue(0));
+ }
+ return Success(APValue(ResultElements.data(), ResultElements.size()), E);
+ }
+
+ case X86::BI__builtin_ia32_pslldqi256_byteshift:
+ case X86::BI__builtin_ia32_psrldqi256_byteshift: {
+ unsigned BuiltinID = E->getBuiltinCallee();
+
+ APSInt Amt;
+ if (!EvaluateInteger(E->getArg(1), Amt, Info))
+ break;
+ unsigned Shift = (unsigned)Amt.getZExtValue();
+
+ APValue Vec;
+ if (!Evaluate(Vec, Info, E->getArg(0)) || !Vec.isVector())
+ break;
+
+ SmallVector<APValue, 32> ResultElements;
+ ResultElements.reserve(32);
+
+ bool isLeft = (BuiltinID == X86::BI__builtin_ia32_pslldqi256_byteshift);
+
+ for (unsigned i = 0; i < 32; i++) {
+ int SrcIdx = -1;
+ if (isLeft)
+ SrcIdx = i + Shift;
+ else if (i >= Shift)
+ SrcIdx = i - Shift;
+
+ if (SrcIdx >= 0 && (unsigned)SrcIdx < 32)
+ ResultElements.push_back(Vec.getVectorElt(SrcIdx));
+ else
+ ResultElements.push_back(APValue(0));
+ }
+ return Success(APValue(ResultElements.data(), ResultElements.size()), E);
+ }
}
}
|
Hi @RKSimon, just kindly pinging this PR |
Will take a look - sorry I missed this (unfortunately github is terrible with notifications for contributions from non project members) - CC'ing me is the best way so it appears in my github mentioned tab |
Understood, I’ll keep that in mind next time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test coverage? they need adding to sse2-builtins.c and avx2-builtins.c
clang/lib/AST/ExprConstant.cpp
Outdated
SrcIdx = i - Shift; | ||
|
||
if (SrcIdx >= 0 && (unsigned)SrcIdx < 16) | ||
ResultElements.push_back(Vec.getVectorElt(SrcIdx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't going to work as currently the intrinsics take <X x long long int>
types - we're going to have to change these to <8X x char>
types to make this a lot easier to deal with - see the palignr builtins for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m done with this one, could you please check one more?
…ing for X86 psllDqi/psrlDqi intrinsics feat(exprconst): branch statement handling
… X86 pslldqi/psrldqi infrinsics
… X86 pslldqi/psrldqi infrinsics
… X86 pslldqi/psrldqi infrinsics
…ing for X86 pslldqi/psrldqi infrinsics
✅ With the latest revision this PR passed the C/C++ code formatter. |
… X86 pslldqi/psrldqi infrinsics
… X86 pslldqi/psrldqi infrinsics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to add InterpBuiltin handling, declare the byteshift builtins as Constexpr in BuiltinsX86.td, add avx512bw-builtins.c test coverage and add tests that check the 'ShiftVal > 15 case).
clang/lib/AST/ExprConstant.cpp
Outdated
// i should emplement SLLDQ, SRLDQ shift (intrinsics) in constant expression | ||
// handling inside this function | ||
// avx2intrin.h -> _mm256_slli_si256 | ||
// emmintrin.h -> _mm_slli_si128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
… X86 psllDqi/psrlDqi intrinsics
… X86 pslldqi/psrldqi intrinsics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - this is looking correct now, please can you resolve the merge conflict and add equivalent handling to InterpBuiltin.cpp (we currently have 2 constexpr handlers so this needs to be duplicated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed an issue with the builtin attributes
|
||
let Features = "sse2", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in { | ||
let Features = "sse2", | ||
Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<128>] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can only add the constexpr attribute to the byteshift intrinsics - better to move the byteshift declarations into the constexor below (psrldi128 et al.)
|
||
let Features = "avx2", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in { | ||
let Features = "avx2", | ||
Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<256>] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same - undo this and move the byteshift declarations into the block with psrlqi256 etc.
|
||
let Features = "avx512bw", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in { | ||
let Features = "avx512bw", | ||
Attributes = [NoThrow, Const, Constexpr, RequiredVectorWidth<512>] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the byteshfts down to the block containing pmulhw512?
… X86 pslldqi/psrldqi intrinsics
|
||
let Features = "sse2", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in { | ||
let Features = "sse2", | ||
Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) revert changes from code unrelated to the patch (this includes formatting)
def pslldqi128_byteshift : X86Builtin<"_Vector<16, char>(_Vector<16, char>, _Constant int)">; | ||
def psrldqi128_byteshift : X86Builtin<"_Vector<16, char>(_Vector<16, char>, _Constant int)">; | ||
def pmaddwd128 | ||
: X86Builtin<"_Vector<4, int>(_Vector<8, short>, _Vector<8, short>)">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) revert changes from code unrelated to the patch (this includes formatting)
|
||
let Features = "avx2", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in { | ||
let Features = "avx2", | ||
Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) revert changes from code unrelated to the patch (this includes formatting)
def psllw256 : X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<8, short>)">; | ||
def pslldqi256_byteshift : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Constant int)">; | ||
def psllw256 | ||
: X86Builtin<"_Vector<16, short>(_Vector<16, short>, _Vector<8, short>)">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) revert changes from code unrelated to the patch (this includes formatting)
def psrad256 : X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Vector<4, int>)">; | ||
def psrldqi256_byteshift : X86Builtin<"_Vector<32, char>(_Vector<32, char>, _Constant int)">; | ||
def psrad256 | ||
: X86Builtin<"_Vector<8, int>(_Vector<8, int>, _Vector<4, int>)">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) revert changes from code unrelated to the patch (this includes formatting)
|
||
let Features = "avx512bw", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in { | ||
let Features = "avx512bw", | ||
Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) revert changes from code unrelated to the patch (this includes formatting)
S, OpPC, Call, [](const APSInt &LHS, const APSInt &RHS) { | ||
unsigned ShiftAmt = RHS.getZExtValue(); | ||
return LHS.shl(ShiftAmt * 8); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't going to work as SLLDQ/SRLDQ aren't per-element instructions - they need to handled like shuffles like you have in ExprConstant.
… X86 pslldqi/psrldqi intrinsics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor
} | ||
|
||
static bool interp__builtin_byteshift( | ||
InterpState & S, CodePtr OpPC, const CallExpr *Call, uint32_t BuiltinID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add bool IsLeft
arg and then split calls below
|
||
case clang::X86::BI__builtin_ia32_pslldqi128: | ||
case clang::X86::BI__builtin_ia32_pslldqi256: | ||
case clang::X86::BI__builtin_ia32_pslldqi512: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return interp__builtin_byteshift(S, OpPC, Call, BuiltinID(), /*IsLeft=*/true);
case clang::X86::BI__builtin_ia32_psrldqi128: | ||
case clang::X86::BI__builtin_ia32_psrldqi256: | ||
case clang::X86::BI__builtin_ia32_psrldqi512: | ||
return interp__builtin_byteshift(S, OpPC, Call, BuiltinID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return interp__builtin_byteshift(S, OpPC, Call, BuiltinID(), /*IsLeft=*/false);
… X86 pslldqi/psrldqi intrinsics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still doesn't merge with trunk - please can you resolve
clang/lib/AST/ExprConstant.cpp
Outdated
} | ||
|
||
return Success(APValue(Result.data(), Result.size()), E); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing brace?
|
||
for (unsigned LaneBase = 0; LaneBase < NumElts; LaneBase += LaneBytes) { | ||
for (unsigned I = 0; I < LaneBytes; ++I) { | ||
int src = IsLeft ? (I + ShiftVal) : (int)I - (int)ShiftVal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) int Src =
… X86 pslldqi/psrldqi intrinsics
… X86 pslldqi/psrldqi intrinsics
… X86 pslldqi/psrldqi intrinsics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still broken - have you actually tested this? ninja check-clang-codegen-x86
} | ||
|
||
TEST_CONSTEXPR(match_v16qi(_mm_srli_si128((__m128i)(__v16qi){1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}, 3), 4,5,6,7,8,9,10,11,12,13,14,15,16,0,0,0)) | ||
TEST_CONSTEXPR(match_v16qi(_mm_srli_si128((__m128i)(__v16qi){1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}, 16), 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALL of these tests are still failing as you're missing brackets around the vector initialisations (because _mm_srli_si128 is a macro it expands early).
TEST_CONSTEXPR(match_v16qi(_mm_srli_si128(((__m128i)(__v16qi){1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}), 16), 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
Update all your other TEST_CONSTEXPR accordingly
const Pointer &VecPtr = S.Stk.pop<Pointer>(); | ||
const Pointer &Dst = S.Stk.peek<Pointer>(); | ||
|
||
unsigned NumElts = VecPtr.getNumElems(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned NumElts = VecPtr.getNumElems(); | |
unsigned NumElems = VecPtr.getNumElems(); |
|
||
static bool interp__builtin_byteshift(InterpState &S, CodePtr OpPC, | ||
const CallExpr *Call, uint32_t BuiltinID, | ||
bool isLeft) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool isLeft) { | |
bool IsLeft) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to do it on my previous PC, but since I switched to a new one, I’ve been having trouble building due to storage limitations, so I couldn’t proceed. Sorry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m sorry if I’m causing difficulties in your progress.
@kimyounhoex1 Are you able to address the remaining feedback soon please? |
Sorry for the delay! I'll fix the remaining issues and push an update soon. |
@kimyounhoex1 are you able to get this finished soon please? the patch is starting to bit rot due to all the other constexpr work |
@RKSimon I’m Younho Kim, the contributor who worked on PR #157403. Through this issue, I realized how challenging it is to contribute to such a large and sophisticated project like LLVM. I studied hard to understand the codebase, but I also learned that I still have a long way to go. I truly appreciate your patience and continuous feedback over the past few weeks. Despite your help and kind guidance, I feel that my current skills are not yet sufficient to finalize this patch properly. I really wanted to finish it, but I’m afraid I’ve already taken too much of your time. Thank you very much for giving me the opportunity to participate and learn from this process. I’m very sorry that I couldn’t meet your expectations this time, but I’ll keep learning and hopefully come back stronger to contribute again in the future. I wanted to send this message by email, but since I couldn’t find a proper way to reach you, I’m leaving it here instead. Best regards, |
@kimyounhoex1 Understood - I'll try to finish this patch and make sure you get mentioned - thanks for trying! |
Thank you so much for letting me take part and learn from this. I really appreciate your guidance! |
These X86 builtins (
__builtin_ia32_pslldqi128/256_byteshift
and__builtin_ia32_psrldqi128/256_byteshift
) can now be evaluated atcompile time when the shift amount is a constant integer.
This improves consistency with other vector shift intrinsics and
reduces unnecessary runtime evaluation.
Fixes #156494