Skip to content

Conversation

@fhahn
Copy link
Contributor

@fhahn fhahn commented Feb 2, 2025

As exposed by #125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered.

@llvmbot
Copy link
Member

llvmbot commented Feb 2, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/125434.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1-4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+19)
  • (added) llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll (+74)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index fac207287e0bcc7..5b9dcf68a62bec5 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -972,10 +972,7 @@ class VPInstruction : public VPRecipeWithIRFlags,
 
   /// Return the cost of this VPInstruction.
   InstructionCost computeCost(ElementCount VF,
-                              VPCostContext &Ctx) const override {
-    // TODO: Compute accurate cost after retiring the legacy cost model.
-    return 0;
-  }
+                              VPCostContext &Ctx) const override;
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the VPInstruction to \p O.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index c84a93d7398f73b..ac70f2498dbaeb7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -768,6 +768,25 @@ void VPInstruction::execute(VPTransformState &State) {
             /*IsScalar*/ GeneratesPerFirstLaneOnly);
 }
 
+InstructionCost VPInstruction::computeCost(ElementCount VF,
+                                           VPCostContext &Ctx) const {
+  if (Instruction::isBinaryOp(getOpcode())) {
+    if (!getUnderlyingValue())
+      return 0;
+
+    assert(!doesGeneratePerAllLanes() && "Should only generate a vector value or single scalar, not scalars for all lanes.);
+    Type *ResTy = Ctx.Types.inferScalarType(this);
+    if (!vputils::onlyFirstLaneUsed(this))
+      ResTy = toVectorTy(ResTy, VF);
+
+    return Ctx.TTI.getArithmeticInstrCost(getOpcode(), ResTy, Ctx.CostKind);
+  }
+
+  assert(!getUnderlyingValue() &&
+         "unexpected VPInstruction without underlying value");
+  return 0;
+}
+
 bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
   if (Instruction::isBinaryOp(getOpcode()))
     return false;
diff --git a/llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll b/llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll
new file mode 100644
index 000000000000000..bb85b88f181f784
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/CostModel/vpinstruction-cost.ll
@@ -0,0 +1,74 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --filter "Cost of"
+; RUN: opt -S -passes=loop-vectorize -mcpu=skylake-avx512 -mtriple=x86_64-apple-macosx -debug -disable-output -S %s 2>&1 | FileCheck %s
+
+; REQUIRES: asserts
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+
+define void @wide_or_replaced_with_add_vpinstruction(ptr %src, ptr noalias %dst) {
+; CHECK-LABEL: 'wide_or_replaced_with_add_vpinstruction'
+; CHECK:  Cost of 1 for VF 2: induction instruction %iv.next = add nuw nsw i64 %iv, 1
+; CHECK:  Cost of 0 for VF 2: induction instruction %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+; CHECK:  Cost of 1 for VF 2: exit condition instruction %exitcond = icmp eq i64 %iv.next, 32
+; CHECK:  Cost of 0 for VF 2: EMIT vp<%3> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
+; CHECK:  Cost of 0 for VF 2: ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<%0>
+; CHECK:  Cost of 0 for VF 2: vp<%4> = SCALAR-STEPS vp<%3>, ir<1>
+; CHECK:  Cost of 0 for VF 2: CLONE ir<%g.src> = getelementptr inbounds ir<%src>, vp<%4>
+; CHECK:  Cost of 0 for VF 2: vp<%5> = vector-pointer ir<%g.src>
+; CHECK:  Cost of 1 for VF 2: WIDEN ir<%l> = load vp<%5>
+; CHECK:  Cost of 1 for VF 2: WIDEN ir<%iv.4> = add ir<%iv>, ir<4>
+; CHECK:  Cost of 1 for VF 2: WIDEN ir<%c> = icmp ule ir<%l>, ir<128>
+; CHECK:  Cost of 1 for VF 2: EMIT ir<%or> = add ir<%iv.4>, ir<1>
+; CHECK:  Cost of 0 for VF 2: CLONE ir<%g.dst> = getelementptr ir<%dst>, ir<%or>
+; CHECK:  Cost of 0 for VF 2: vp<%6> = vector-pointer ir<%g.dst>
+; CHECK:  Cost of 1 for VF 2: WIDEN store vp<%6>, ir<%iv.4>, ir<%c>
+; CHECK:  Cost of 0 for VF 2: EMIT vp<%index.next> = add nuw vp<%3>, vp<%1>
+; CHECK:  Cost of 0 for VF 2: EMIT branch-on-count vp<%index.next>, vp<%2>
+; CHECK:  Cost of 0 for VF 2: vector loop backedge
+; CHECK:  Cost of 1 for VF 4: induction instruction %iv.next = add nuw nsw i64 %iv, 1
+; CHECK:  Cost of 0 for VF 4: induction instruction %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+; CHECK:  Cost of 1 for VF 4: exit condition instruction %exitcond = icmp eq i64 %iv.next, 32
+; CHECK:  Cost of 0 for VF 4: EMIT vp<%3> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
+; CHECK:  Cost of 0 for VF 4: ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<%0>
+; CHECK:  Cost of 0 for VF 4: vp<%4> = SCALAR-STEPS vp<%3>, ir<1>
+; CHECK:  Cost of 0 for VF 4: CLONE ir<%g.src> = getelementptr inbounds ir<%src>, vp<%4>
+; CHECK:  Cost of 0 for VF 4: vp<%5> = vector-pointer ir<%g.src>
+; CHECK:  Cost of 1 for VF 4: WIDEN ir<%l> = load vp<%5>
+; CHECK:  Cost of 1 for VF 4: WIDEN ir<%iv.4> = add ir<%iv>, ir<4>
+; CHECK:  Cost of 1 for VF 4: WIDEN ir<%c> = icmp ule ir<%l>, ir<128>
+; CHECK:  Cost of 1 for VF 4: EMIT ir<%or> = add ir<%iv.4>, ir<1>
+; CHECK:  Cost of 0 for VF 4: CLONE ir<%g.dst> = getelementptr ir<%dst>, ir<%or>
+; CHECK:  Cost of 0 for VF 4: vp<%6> = vector-pointer ir<%g.dst>
+; CHECK:  Cost of 1 for VF 4: WIDEN store vp<%6>, ir<%iv.4>, ir<%c>
+; CHECK:  Cost of 0 for VF 4: EMIT vp<%index.next> = add nuw vp<%3>, vp<%1>
+; CHECK:  Cost of 0 for VF 4: EMIT branch-on-count vp<%index.next>, vp<%2>
+; CHECK:  Cost of 0 for VF 4: vector loop backedge
+; CHECK:  Cost of 1 for VF 4: induction instruction %iv.next = add nuw nsw i64 %iv, 1
+; CHECK:  Cost of 0 for VF 4: induction instruction %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+; CHECK:  Cost of 1 for VF 4: exit condition instruction %exitcond = icmp eq i64 %iv.next, 32
+;
+entry:
+  br label %loop.header
+
+loop.header:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+  %g.src = getelementptr inbounds i64, ptr %src, i64 %iv
+  %l = load i64, ptr %g.src
+  %iv.4 = add nuw nsw i64 %iv, 4
+  %c = icmp ule i64 %l, 128
+  br i1 %c, label %loop.then, label %loop.latch
+
+loop.then:
+  %or = or disjoint i64 %iv.4, 1
+  %g.dst = getelementptr inbounds i64, ptr %dst, i64 %or
+  store i64 %iv.4, ptr %g.dst, align 4
+  br label %loop.latch
+
+loop.latch:
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond = icmp eq i64 %iv.next, 32
+  br i1 %exitcond, label %exit, label %loop.header
+
+exit:
+  ret void
+}

@fhahn fhahn force-pushed the vpinstruction-binop-cost branch from 54439f4 to 4454f6f Compare February 2, 2025 21:29
@fhahn fhahn marked this pull request as draft February 2, 2025 22:02
@fhahn fhahn force-pushed the vpinstruction-binop-cost branch from 4454f6f to 88e4792 Compare February 4, 2025 11:57
@fhahn fhahn marked this pull request as ready for review February 4, 2025 16:06
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

InstructionCost VPInstruction::computeCost(ElementCount VF,
VPCostContext &Ctx) const {
if (Instruction::isBinaryOp(getOpcode())) {
if (!getUnderlyingValue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO here to deal with this case? Even if there isn't an underlying value, the cost of an add is still not zero. Alternatively, if there should always be an underlying value you could assert this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, added a TODO,thanks!

I don't think it can be an assert, as there are cases where we create binary ops without underlying instructions in some cases.

}

assert(!getUnderlyingValue() &&
"unexpected VPInstruction without underlying value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be unexpected VPInstruction with underlying value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep should be fixed, thanks!

@fhahn fhahn force-pushed the vpinstruction-binop-cost branch from 88e4792 to 1877d1b Compare February 4, 2025 21:53
@fhahn fhahn force-pushed the vpinstruction-binop-cost branch from 1877d1b to e37c2ad Compare February 6, 2025 19:47
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hassnaaHamdi
Copy link
Member

Hi @fhahn could you plz land this patch ? because my patch is depending on it. Thank you :))

@fhahn fhahn merged commit 1611059 into llvm:main Feb 7, 2025
8 checks passed
@fhahn fhahn deleted the vpinstruction-binop-cost branch February 7, 2025 15:27
@fhahn
Copy link
Contributor Author

fhahn commented Feb 7, 2025

Hi @fhahn could you plz land this patch ? because my patch is depending on it. Thank you :))

Done now

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 7, 2025
…rlying values. (#125434)

As exposed by llvm/llvm-project#125094, we are
missing cost computation for some binary VPInstructions we created based
on original IR instructions. Their cost should be considered.

PR: llvm/llvm-project#125434
hassnaaHamdi added a commit that referenced this pull request Feb 9, 2025
This patch relands the changes from "[LV]: Teach LV to recursively
(de)interleave.#122989"
    Reason for revert:
- The patch exposed an assert in the vectorizer related to VF difference
between
legacy cost model and VPlan-based cost model because of uncalculated
cost for
VPInstruction which is created by VPlanTransforms as a replacement to
'or disjoint'
       instruction.
VPlanTransforms do that instructions change when there are memory
interleaving and
predicated blocks, but that change didn't cause problems because at most
cases the cost
      difference between legacy/new models is not noticeable.
    - Issue is fixed by #125434

Original patch: #89018
Reviewed-by: paulwalker-arm, Mel-Chen
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…es. (llvm#125434)

As exposed by llvm#125094, we are
missing cost computation for some binary VPInstructions we created based
on original IR instructions. Their cost should be considered.

PR: llvm#125434
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
This patch relands the changes from "[LV]: Teach LV to recursively
(de)interleave.llvm#122989"
    Reason for revert:
- The patch exposed an assert in the vectorizer related to VF difference
between
legacy cost model and VPlan-based cost model because of uncalculated
cost for
VPInstruction which is created by VPlanTransforms as a replacement to
'or disjoint'
       instruction.
VPlanTransforms do that instructions change when there are memory
interleaving and
predicated blocks, but that change didn't cause problems because at most
cases the cost
      difference between legacy/new models is not noticeable.
    - Issue is fixed by llvm#125434

Original patch: llvm#89018
Reviewed-by: paulwalker-arm, Mel-Chen
hassnaaHamdi added a commit to hassnaaHamdi/llvm-project that referenced this pull request Feb 13, 2025
…es. llvm#125434

As exposed by llvm#125094, we are missing cost computation for some
binary VPInstructions we created based on original IR instructions.
Their cost should be considered.
PR: llvm#125434

Author: Florian Hahn <[email protected]>

Change-Id: Icf985b3f1cd40898a17faaf47b241e2651f9e8dd
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Feb 21, 2025
…es. llvm#125434

As exposed by llvm#125094, we are missing cost computation for some
binary VPInstructions we created based on original IR instructions.
Their cost should be considered.
PR: llvm#125434

Author: Florian Hahn <[email protected]>

Change-Id: Icf985b3f1cd40898a17faaf47b241e2651f9e8dd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants