Skip to content

Conversation

@david-arm
Copy link
Contributor

We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.

We were incorrectly using the TTI::TCK_RecipThroughput cost kind
and ignoring the kind set in the context.
@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2025

@llvm/pr-subscribers-vectorizers

Author: David Sherwood (david-arm)

Changes

We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.


Full diff: https://github.com/llvm/llvm-project/pull/153216.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+1-2)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e34cab117f321..a121f4f54845c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2944,7 +2944,6 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
   // transform, avoid computing their cost multiple times for now.
   Ctx.SkipCostComputation.insert(UI);
 
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   Type *ResultTy = Ctx.Types.inferScalarType(this);
   switch (UI->getOpcode()) {
   case Instruction::GetElementPtr:
@@ -2970,7 +2969,7 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
     auto Op2Info = Ctx.getOperandInfo(getOperand(1));
     SmallVector<const Value *, 4> Operands(UI->operand_values());
     return Ctx.TTI.getArithmeticInstrCost(
-               UI->getOpcode(), ResultTy, CostKind,
+               UI->getOpcode(), ResultTy, Ctx.CostKind,
                {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
                Op2Info, Operands, UI, &Ctx.TLI) *
            (isSingleScalar() ? 1 : VF.getFixedValue());

@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2025

@llvm/pr-subscribers-llvm-transforms

Author: David Sherwood (david-arm)

Changes

We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.


Full diff: https://github.com/llvm/llvm-project/pull/153216.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+1-2)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e34cab117f321..a121f4f54845c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2944,7 +2944,6 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
   // transform, avoid computing their cost multiple times for now.
   Ctx.SkipCostComputation.insert(UI);
 
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   Type *ResultTy = Ctx.Types.inferScalarType(this);
   switch (UI->getOpcode()) {
   case Instruction::GetElementPtr:
@@ -2970,7 +2969,7 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
     auto Op2Info = Ctx.getOperandInfo(getOperand(1));
     SmallVector<const Value *, 4> Operands(UI->operand_values());
     return Ctx.TTI.getArithmeticInstrCost(
-               UI->getOpcode(), ResultTy, CostKind,
+               UI->getOpcode(), ResultTy, Ctx.CostKind,
                {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
                Op2Info, Operands, UI, &Ctx.TLI) *
            (isSingleScalar() ? 1 : VF.getFixedValue());

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Would be great to have a test case for this. I can check if I can surface anything

@david-arm
Copy link
Contributor Author

Thanks. Would be great to have a test case for this. I can check if I can surface anything

I don't think this is even possible at the moment, because this will currently only apply if you build with -Oz -fvectorize, but you can't even write a test case for this because we hit the error at the end of LoopVectorizationCostModel::computeMaxVF:

  reportVectorizationFailure(
      "Cannot optimize for size and vectorize at the same time.",

I suppose that really means this patch is NFC and should be harmless. I just wanted to change the CostKind for completeness.

@fhahn
Copy link
Contributor

fhahn commented Aug 14, 2025

Thanks. Would be great to have a test case for this. I can check if I can surface anything

I don't think this is even possible at the moment, because this will currently only apply if you build with -Oz -fvectorize, but you can't even write a test case for this because we hit the error at the end of LoopVectorizationCostModel::computeMaxVF:

  reportVectorizationFailure(
      "Cannot optimize for size and vectorize at the same time.",

I suppose that really means this patch is NFC and should be harmless. I just wanted to change the CostKind for completeness.

Hm, I think we should definitely hit VPReplicateRecipe::computeCost with different CostKinds, e.g. for this test https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll#L221

For Os/Oz, the requirement is that there is no scalar tail + no runtime checks IIRC, and the message is for that case.

@david-arm
Copy link
Contributor Author

Thanks. Would be great to have a test case for this. I can check if I can surface anything

I don't think this is even possible at the moment, because this will currently only apply if you build with -Oz -fvectorize, but you can't even write a test case for this because we hit the error at the end of LoopVectorizationCostModel::computeMaxVF:

  reportVectorizationFailure(
      "Cannot optimize for size and vectorize at the same time.",

I suppose that really means this patch is NFC and should be harmless. I just wanted to change the CostKind for completeness.

Hm, I think we should definitely hit VPReplicateRecipe::computeCost with different CostKinds, e.g. for this test https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll#L221

For Os/Oz, the requirement is that there is no scalar tail + no runtime checks IIRC, and the message is for that case.

Nope, we never create replicate recipes at all due to this from the debug output:

LV: Not considering vector loop of width 2 because it would cause replicated blocks to be generated, which isn't allowed when optimizing for size.
LV: Not considering vector loop of width 4 because it would cause replicated blocks to be generated, which isn't allowed when optimizing for size.
LV: Not considering vector loop of width 8 because it would cause replicated blocks to be generated, which isn't allowed when optimizing for size.
LV: Not considering vector loop of width 16 because it would cause replicated blocks to be generated, which isn't allowed when optimizing for size.

@fhahn
Copy link
Contributor

fhahn commented Aug 14, 2025

We never create replicating replicate recpipes, but we can generate single-scalar replicate recipes, e.g. a load of a uniform address here https://llvm.godbolt.org/z/6bvnrY965

@david-arm
Copy link
Contributor Author

We never create replicating replicate recpipes, but we can generate single-scalar replicate recipes, e.g. a load of a uniform address here https://llvm.godbolt.org/z/6bvnrY965

OK I can try playing around with variations of this, but certainly the test case shown above doesn't exercise the code I've changed because the CLONE recipes are only for loads and geps. The loads aren't covered because we fall back on the legacy cost model (which I have actually fixed in #153218) and the geps always return a cost of 0. I can either combine this PR together with #153218, or land #153218 first.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. I checked on a large input set for both X86 and AArch64 and there were no differences, so likely not really feasible to write a test case.

@david-arm
Copy link
Contributor Author

Ran make chek-all downstream and looks fine.

@david-arm david-arm merged commit 7ee6cf0 into llvm:main Aug 18, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants