-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[LV] Fix incorrect cost kind in VPReplicateRecipe::computeCost #153216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.
|
@llvm/pr-subscribers-vectorizers Author: David Sherwood (david-arm) ChangesWe were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context. Full diff: https://github.com/llvm/llvm-project/pull/153216.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e34cab117f321..a121f4f54845c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2944,7 +2944,6 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
// transform, avoid computing their cost multiple times for now.
Ctx.SkipCostComputation.insert(UI);
- TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
Type *ResultTy = Ctx.Types.inferScalarType(this);
switch (UI->getOpcode()) {
case Instruction::GetElementPtr:
@@ -2970,7 +2969,7 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
auto Op2Info = Ctx.getOperandInfo(getOperand(1));
SmallVector<const Value *, 4> Operands(UI->operand_values());
return Ctx.TTI.getArithmeticInstrCost(
- UI->getOpcode(), ResultTy, CostKind,
+ UI->getOpcode(), ResultTy, Ctx.CostKind,
{TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
Op2Info, Operands, UI, &Ctx.TLI) *
(isSingleScalar() ? 1 : VF.getFixedValue());
|
|
@llvm/pr-subscribers-llvm-transforms Author: David Sherwood (david-arm) ChangesWe were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context. Full diff: https://github.com/llvm/llvm-project/pull/153216.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e34cab117f321..a121f4f54845c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2944,7 +2944,6 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
// transform, avoid computing their cost multiple times for now.
Ctx.SkipCostComputation.insert(UI);
- TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
Type *ResultTy = Ctx.Types.inferScalarType(this);
switch (UI->getOpcode()) {
case Instruction::GetElementPtr:
@@ -2970,7 +2969,7 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
auto Op2Info = Ctx.getOperandInfo(getOperand(1));
SmallVector<const Value *, 4> Operands(UI->operand_values());
return Ctx.TTI.getArithmeticInstrCost(
- UI->getOpcode(), ResultTy, CostKind,
+ UI->getOpcode(), ResultTy, Ctx.CostKind,
{TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
Op2Info, Operands, UI, &Ctx.TLI) *
(isSingleScalar() ? 1 : VF.getFixedValue());
|
fhahn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Would be great to have a test case for this. I can check if I can surface anything
I don't think this is even possible at the moment, because this will currently only apply if you build with I suppose that really means this patch is NFC and should be harmless. I just wanted to change the CostKind for completeness. |
Hm, I think we should definitely hit VPReplicateRecipe::computeCost with different CostKinds, e.g. for this test https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll#L221 For Os/Oz, the requirement is that there is no scalar tail + no runtime checks IIRC, and the message is for that case. |
Nope, we never create replicate recipes at all due to this from the debug output: |
|
We never create replicating replicate recpipes, but we can generate single-scalar replicate recipes, e.g. a load of a uniform address here https://llvm.godbolt.org/z/6bvnrY965 |
OK I can try playing around with variations of this, but certainly the test case shown above doesn't exercise the code I've changed because the CLONE recipes are only for loads and geps. The loads aren't covered because we fall back on the legacy cost model (which I have actually fixed in #153218) and the geps always return a cost of 0. I can either combine this PR together with #153218, or land #153218 first. |
fhahn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. I checked on a large input set for both X86 and AArch64 and there were no differences, so likely not really feasible to write a test case.
|
Ran make chek-all downstream and looks fine. |
We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.