-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[X86] Generate vpmuludq instead of vpmullq
#121456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When lowering `_mm512_mul_epu32` intrinsic if the generated value if later used in a vector shuffle we generate `vpmullq` instead of `vpmuludq` (https://godbolt.org/z/WbaGMqs8e) because `SimplifyDemandedVectorElts` simplifies the arguments and we fail the combine to `PMULDQ`. Added an override to `shouldSimplifyDemandedVectorElts` in `X86TargetLowering` to check if we can combine the `MUL` to `PMULDQ` first.
|
@llvm/pr-subscribers-backend-x86 Author: None (abhishek-kaushik22) ChangesWhen lowering Added an override to Full diff: https://github.com/llvm/llvm-project/pull/121456.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a0514e93d6598b..e104264bcbf918 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -60832,3 +60832,24 @@ Align X86TargetLowering::getPrefLoopAlignment(MachineLoop *ML) const {
return Align(1ULL << ExperimentalPrefInnermostLoopAlignment);
return TargetLowering::getPrefLoopAlignment();
}
+
+bool X86TargetLowering::shouldSimplifyDemandedVectorElts(
+ SDValue Op, const TargetLoweringOpt &TLO) const {
+ if (Op.getOpcode() == ISD::VECTOR_SHUFFLE) {
+ SDValue V0 = peekThroughBitcasts(Op.getOperand(0));
+ SDValue V1 = peekThroughBitcasts(Op.getOperand(1));
+
+ if (V0.getOpcode() == ISD::MUL || V1.getOpcode() == ISD::MUL) {
+ SDNode *Mul = V0.getOpcode() == ISD::MUL ? V0.getNode() : V1.getNode();
+ SelectionDAG &DAG = TLO.DAG;
+ const X86Subtarget &Subtarget = DAG.getSubtarget<X86Subtarget>();
+ const SDLoc DL(Mul);
+
+ if (SDValue V = combineMulToPMULDQ(Mul, DL, DAG, Subtarget)) {
+ DAG.ReplaceAllUsesWith(Mul, V.getNode());
+ return false;
+ }
+ }
+ }
+ return true;
+}
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 2b7a8eaf249d83..0a6cd53f557bb2 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1207,6 +1207,9 @@ namespace llvm {
bool hasBitTest(SDValue X, SDValue Y) const override;
+ bool shouldSimplifyDemandedVectorElts(
+ SDValue Op, const TargetLoweringOpt &TLO) const override;
+
bool shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd(
SDValue X, ConstantSDNode *XC, ConstantSDNode *CC, SDValue Y,
unsigned OldShiftOpcode, unsigned NewShiftOpcode,
|
|
@phoebewang @e-kud @RKSimon can you please review? |
Is it possible to combine the new patten instead of disble |
I'm not sure. We start with and this gets replaced by Is this safe to replace with |
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure shouldSimplifyDemandedVectorElts is the best way to handle this tbh
| %4 = shufflevector <16 x i32> <i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison>, <16 x i32> %3, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30> | ||
| %5 = bitcast <16 x i32> %4 to <8 x i64> | ||
| ret <8 x i64> %5 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this be happening for v2i64/v4i64 as well on avx512dq targets? Please can you add test coverage for those as well.
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
| ; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx512dq | FileCheck %s | ||
|
|
||
| define <8 x i64> @pr121456(<8 x i64> %a, <8 x i64> %b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't name the test file after a pull request - "pr" is the old llvm term for problem report - the number should be based off a reported issue number (is there one?) - otherwise I'd probably suggest adding these tests to combine-pmuldq.ll instead
|
|
||
| bool X86TargetLowering::shouldSimplifyDemandedVectorElts( | ||
| SDValue Op, const TargetLoweringOpt &TLO) const { | ||
| if (Op.getOpcode() == ISD::VECTOR_SHUFFLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't going to work in the general case, it will just help shuffles:
https://llvm.godbolt.org/z/no1Gc6fzT
When lowering
_mm512_mul_epu32intrinsic if the generated value if later used in a vector shuffle we generatevpmullqinstead ofvpmuludq(https://godbolt.org/z/WbaGMqs8e) becauseSimplifyDemandedVectorEltssimplifies the arguments and we fail the combine toPMULDQ.Added an override to
shouldSimplifyDemandedVectorEltsinX86TargetLoweringto check if we can combine theMULtoPMULUDQfirst.