-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[VPlan] Compute cost for binary op VPInstruction with underlying values. #125434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesFull diff: https://github.com/llvm/llvm-project/pull/125434.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index fac207287e0bcc7..5b9dcf68a62bec5 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -972,10 +972,7 @@ class VPInstruction : public VPRecipeWithIRFlags,
/// Return the cost of this VPInstruction.
InstructionCost computeCost(ElementCount VF,
- VPCostContext &Ctx) const override {
- // TODO: Compute accurate cost after retiring the legacy cost model.
- return 0;
- }
+ VPCostContext &Ctx) const override;
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the VPInstruction to \p O.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index c84a93d7398f73b..ac70f2498dbaeb7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -768,6 +768,25 @@ void VPInstruction::execute(VPTransformState &State) {
/*IsScalar*/ GeneratesPerFirstLaneOnly);
}
+InstructionCost VPInstruction::computeCost(ElementCount VF,
+ VPCostContext &Ctx) const {
+ if (Instruction::isBinaryOp(getOpcode())) {
+ if (!getUnderlyingValue())
+ return 0;
+
+ assert(!doesGeneratePerAllLanes() && "Should only generate a vector value or single scalar, not scalars for all lanes.);
+ Type *ResTy = Ctx.Types.inferScalarType(this);
+ if (!vputils::onlyFirstLaneUsed(this))
+ ResTy = toVectorTy(ResTy, VF);
+
+ return Ctx.TTI.getArithmeticInstrCost(getOpcode(), ResTy, Ctx.CostKind);
+ }
+
+ assert(!getUnderlyingValue() &&
+ "unexpected VPInstruction without underlying value");
+ return 0;
+}
+
bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
if (Instruction::isBinaryOp(getOpcode()))
return false;
diff --git a/llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll b/llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll
new file mode 100644
index 000000000000000..bb85b88f181f784
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll
@@ -0,0 +1,74 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --filter "Cost of"
+; RUN: opt -S -passes=loop-vectorize -mcpu=skylake-avx512 -mtriple=x86_64-apple-macosx -debug -disable-output -S %s 2>&1 | FileCheck %s
+
+; REQUIRES: asserts
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+
+define void @wide_or_replaced_with_add_vpinstruction(ptr %src, ptr noalias %dst) {
+; CHECK-LABEL: 'wide_or_replaced_with_add_vpinstruction'
+; CHECK: Cost of 1 for VF 2: induction instruction %iv.next = add nuw nsw i64 %iv, 1
+; CHECK: Cost of 0 for VF 2: induction instruction %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+; CHECK: Cost of 1 for VF 2: exit condition instruction %exitcond = icmp eq i64 %iv.next, 32
+; CHECK: Cost of 0 for VF 2: EMIT vp<%3> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
+; CHECK: Cost of 0 for VF 2: ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<%0>
+; CHECK: Cost of 0 for VF 2: vp<%4> = SCALAR-STEPS vp<%3>, ir<1>
+; CHECK: Cost of 0 for VF 2: CLONE ir<%g.src> = getelementptr inbounds ir<%src>, vp<%4>
+; CHECK: Cost of 0 for VF 2: vp<%5> = vector-pointer ir<%g.src>
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%l> = load vp<%5>
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%iv.4> = add ir<%iv>, ir<4>
+; CHECK: Cost of 1 for VF 2: WIDEN ir<%c> = icmp ule ir<%l>, ir<128>
+; CHECK: Cost of 1 for VF 2: EMIT ir<%or> = add ir<%iv.4>, ir<1>
+; CHECK: Cost of 0 for VF 2: CLONE ir<%g.dst> = getelementptr ir<%dst>, ir<%or>
+; CHECK: Cost of 0 for VF 2: vp<%6> = vector-pointer ir<%g.dst>
+; CHECK: Cost of 1 for VF 2: WIDEN store vp<%6>, ir<%iv.4>, ir<%c>
+; CHECK: Cost of 0 for VF 2: EMIT vp<%index.next> = add nuw vp<%3>, vp<%1>
+; CHECK: Cost of 0 for VF 2: EMIT branch-on-count vp<%index.next>, vp<%2>
+; CHECK: Cost of 0 for VF 2: vector loop backedge
+; CHECK: Cost of 1 for VF 4: induction instruction %iv.next = add nuw nsw i64 %iv, 1
+; CHECK: Cost of 0 for VF 4: induction instruction %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+; CHECK: Cost of 1 for VF 4: exit condition instruction %exitcond = icmp eq i64 %iv.next, 32
+; CHECK: Cost of 0 for VF 4: EMIT vp<%3> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
+; CHECK: Cost of 0 for VF 4: ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<%0>
+; CHECK: Cost of 0 for VF 4: vp<%4> = SCALAR-STEPS vp<%3>, ir<1>
+; CHECK: Cost of 0 for VF 4: CLONE ir<%g.src> = getelementptr inbounds ir<%src>, vp<%4>
+; CHECK: Cost of 0 for VF 4: vp<%5> = vector-pointer ir<%g.src>
+; CHECK: Cost of 1 for VF 4: WIDEN ir<%l> = load vp<%5>
+; CHECK: Cost of 1 for VF 4: WIDEN ir<%iv.4> = add ir<%iv>, ir<4>
+; CHECK: Cost of 1 for VF 4: WIDEN ir<%c> = icmp ule ir<%l>, ir<128>
+; CHECK: Cost of 1 for VF 4: EMIT ir<%or> = add ir<%iv.4>, ir<1>
+; CHECK: Cost of 0 for VF 4: CLONE ir<%g.dst> = getelementptr ir<%dst>, ir<%or>
+; CHECK: Cost of 0 for VF 4: vp<%6> = vector-pointer ir<%g.dst>
+; CHECK: Cost of 1 for VF 4: WIDEN store vp<%6>, ir<%iv.4>, ir<%c>
+; CHECK: Cost of 0 for VF 4: EMIT vp<%index.next> = add nuw vp<%3>, vp<%1>
+; CHECK: Cost of 0 for VF 4: EMIT branch-on-count vp<%index.next>, vp<%2>
+; CHECK: Cost of 0 for VF 4: vector loop backedge
+; CHECK: Cost of 1 for VF 4: induction instruction %iv.next = add nuw nsw i64 %iv, 1
+; CHECK: Cost of 0 for VF 4: induction instruction %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+; CHECK: Cost of 1 for VF 4: exit condition instruction %exitcond = icmp eq i64 %iv.next, 32
+;
+entry:
+ br label %loop.header
+
+loop.header:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+ %g.src = getelementptr inbounds i64, ptr %src, i64 %iv
+ %l = load i64, ptr %g.src
+ %iv.4 = add nuw nsw i64 %iv, 4
+ %c = icmp ule i64 %l, 128
+ br i1 %c, label %loop.then, label %loop.latch
+
+loop.then:
+ %or = or disjoint i64 %iv.4, 1
+ %g.dst = getelementptr inbounds i64, ptr %dst, i64 %or
+ store i64 %iv.4, ptr %g.dst, align 4
+ br label %loop.latch
+
+loop.latch:
+ %iv.next = add nuw nsw i64 %iv, 1
+ %exitcond = icmp eq i64 %iv.next, 32
+ br i1 %exitcond, label %exit, label %loop.header
+
+exit:
+ ret void
+}
|
54439f4 to
4454f6f
Compare
4454f6f to
88e4792
Compare
david-arm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
| InstructionCost VPInstruction::computeCost(ElementCount VF, | ||
| VPCostContext &Ctx) const { | ||
| if (Instruction::isBinaryOp(getOpcode())) { | ||
| if (!getUnderlyingValue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO here to deal with this case? Even if there isn't an underlying value, the cost of an add is still not zero. Alternatively, if there should always be an underlying value you could assert this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, added a TODO,thanks!
I don't think it can be an assert, as there are cases where we create binary ops without underlying instructions in some cases.
| } | ||
|
|
||
| assert(!getUnderlyingValue() && | ||
| "unexpected VPInstruction without underlying value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be unexpected VPInstruction with underlying value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep should be fixed, thanks!
88e4792 to
1877d1b
Compare
1877d1b to
e37c2ad
Compare
david-arm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Hi @fhahn could you plz land this patch ? because my patch is depending on it. Thank you :)) |
Done now |
…rlying values. (#125434) As exposed by llvm/llvm-project#125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered. PR: llvm/llvm-project#125434
This patch relands the changes from "[LV]: Teach LV to recursively (de)interleave.#122989" Reason for revert: - The patch exposed an assert in the vectorizer related to VF difference between legacy cost model and VPlan-based cost model because of uncalculated cost for VPInstruction which is created by VPlanTransforms as a replacement to 'or disjoint' instruction. VPlanTransforms do that instructions change when there are memory interleaving and predicated blocks, but that change didn't cause problems because at most cases the cost difference between legacy/new models is not noticeable. - Issue is fixed by #125434 Original patch: #89018 Reviewed-by: paulwalker-arm, Mel-Chen
…es. (llvm#125434) As exposed by llvm#125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered. PR: llvm#125434
This patch relands the changes from "[LV]: Teach LV to recursively (de)interleave.llvm#122989" Reason for revert: - The patch exposed an assert in the vectorizer related to VF difference between legacy cost model and VPlan-based cost model because of uncalculated cost for VPInstruction which is created by VPlanTransforms as a replacement to 'or disjoint' instruction. VPlanTransforms do that instructions change when there are memory interleaving and predicated blocks, but that change didn't cause problems because at most cases the cost difference between legacy/new models is not noticeable. - Issue is fixed by llvm#125434 Original patch: llvm#89018 Reviewed-by: paulwalker-arm, Mel-Chen
…es. llvm#125434 As exposed by llvm#125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered. PR: llvm#125434 Author: Florian Hahn <[email protected]> Change-Id: Icf985b3f1cd40898a17faaf47b241e2651f9e8dd
…es. llvm#125434 As exposed by llvm#125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered. PR: llvm#125434 Author: Florian Hahn <[email protected]> Change-Id: Icf985b3f1cd40898a17faaf47b241e2651f9e8dd
As exposed by #125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered.