[VPlan] Only apply forced cost to recipes with underlying values. #168372

fhahn · 2025-11-17T14:09:50Z

Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost.

This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs.

Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost. This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs.

llvmbot · 2025-11-17T14:10:23Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost.

This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs.

Full diff: https://github.com/llvm/llvm-project/pull/168372.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+8-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+118)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index cf95b4eac9d75..e91ab4fcafee4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -278,9 +278,14 @@ InstructionCost VPRecipeBase::cost(ElementCount VF, VPCostContext &Ctx) {
     RecipeCost = 0;
   } else {
     RecipeCost = computeCost(VF, Ctx);
-    if (UI && ForceTargetInstructionCost.getNumOccurrences() > 0 &&
-        RecipeCost.isValid())
-      RecipeCost = InstructionCost(ForceTargetInstructionCost);
+    RecipeCost = computeCost(VF, Ctx);
+    if (ForceTargetInstructionCost.getNumOccurrences() > 0 &&
+        RecipeCost.isValid()) {
+      if (UI)
+        RecipeCost = InstructionCost(ForceTargetInstructionCost);
+      else
+        RecipeCost = InstructionCost(0);
+    }
   }
 
   LLVM_DEBUG({
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
index 29bbd015eed1f..a780b6409b93e 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
@@ -380,6 +380,124 @@ for.end:
   ret void
 }
 
+define void @loop_with_freeze_and_conditional_srem(ptr %dst, ptr %keyinfo, ptr %invariant.ptr, i32 %divisor) #1 {
+; COMMON-LABEL: define void @loop_with_freeze_and_conditional_srem(
+; COMMON-SAME: ptr [[DST:%.*]], ptr [[KEYINFO:%.*]], ptr [[INVARIANT_PTR:%.*]], i32 [[DIVISOR:%.*]]) {
+; COMMON-NEXT:  [[ENTRY:.*:]]
+; COMMON-NEXT:    br label %[[VECTOR_MEMCHECK:.*]]
+; COMMON:       [[VECTOR_MEMCHECK]]:
+; COMMON-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 4
+; COMMON-NEXT:    [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[KEYINFO]], i64 4
+; COMMON-NEXT:    [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[INVARIANT_PTR]], i64 4
+; COMMON-NEXT:    [[BOUND0:%.*]] = icmp ult ptr [[DST]], [[SCEVGEP1]]
+; COMMON-NEXT:    [[BOUND1:%.*]] = icmp ult ptr [[KEYINFO]], [[SCEVGEP]]
+; COMMON-NEXT:    [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
+; COMMON-NEXT:    [[BOUND03:%.*]] = icmp ult ptr [[DST]], [[SCEVGEP2]]
+; COMMON-NEXT:    [[BOUND14:%.*]] = icmp ult ptr [[INVARIANT_PTR]], [[SCEVGEP]]
+; COMMON-NEXT:    [[FOUND_CONFLICT5:%.*]] = and i1 [[BOUND03]], [[BOUND14]]
+; COMMON-NEXT:    [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT5]]
+; COMMON-NEXT:    [[BOUND06:%.*]] = icmp ult ptr [[KEYINFO]], [[SCEVGEP2]]
+; COMMON-NEXT:    [[BOUND17:%.*]] = icmp ult ptr [[INVARIANT_PTR]], [[SCEVGEP1]]
+; COMMON-NEXT:    [[FOUND_CONFLICT8:%.*]] = and i1 [[BOUND06]], [[BOUND17]]
+; COMMON-NEXT:    [[CONFLICT_RDX9:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT8]]
+; COMMON-NEXT:    br i1 [[CONFLICT_RDX9]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; COMMON:       [[VECTOR_PH]]:
+; COMMON-NEXT:    br label %[[VECTOR_BODY:.*]]
+; COMMON:       [[VECTOR_BODY]]:
+; COMMON-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE23:.*]] ]
+; COMMON-NEXT:    [[TMP0:%.*]] = load i32, ptr [[INVARIANT_PTR]], align 4, !alias.scope [[META16:![0-9]+]]
+; COMMON-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i64 0
+; COMMON-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; COMMON-NEXT:    [[TMP1:%.*]] = freeze <4 x i32> [[BROADCAST_SPLAT]]
+; COMMON-NEXT:    [[TMP2:%.*]] = icmp eq <4 x i32> [[TMP1]], zeroinitializer
+; COMMON-NEXT:    [[TMP3:%.*]] = xor <4 x i1> [[TMP2]], splat (i1 true)
+; COMMON-NEXT:    [[TMP4:%.*]] = extractelement <4 x i1> [[TMP3]], i32 0
+; COMMON-NEXT:    br i1 [[TMP4]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
+; COMMON:       [[PRED_STORE_IF]]:
+; COMMON-NEXT:    [[TMP5:%.*]] = srem i32 1, [[DIVISOR]]
+; COMMON-NEXT:    store i32 [[TMP5]], ptr [[DST]], align 4, !alias.scope [[META19:![0-9]+]], !noalias [[META21:![0-9]+]]
+; COMMON-NEXT:    br label %[[PRED_STORE_CONTINUE]]
+; COMMON:       [[PRED_STORE_CONTINUE]]:
+; COMMON-NEXT:    [[TMP6:%.*]] = extractelement <4 x i1> [[TMP3]], i32 1
+; COMMON-NEXT:    br i1 [[TMP6]], label %[[PRED_STORE_IF10:.*]], label %[[PRED_STORE_CONTINUE11:.*]]
+; COMMON:       [[PRED_STORE_IF10]]:
+; COMMON-NEXT:    [[TMP7:%.*]] = srem i32 1, [[DIVISOR]]
+; COMMON-NEXT:    store i32 [[TMP7]], ptr [[DST]], align 4, !alias.scope [[META19]], !noalias [[META21]]
+; COMMON-NEXT:    br label %[[PRED_STORE_CONTINUE11]]
+; COMMON:       [[PRED_STORE_CONTINUE11]]:
+; COMMON-NEXT:    [[TMP8:%.*]] = extractelement <4 x i1> [[TMP3]], i32 2
+; COMMON-NEXT:    br i1 [[TMP8]], label %[[PRED_STORE_IF12:.*]], label %[[PRED_STORE_CONTINUE13:.*]]
+; COMMON:       [[PRED_STORE_IF12]]:
+; COMMON-NEXT:    [[TMP9:%.*]] = srem i32 1, [[DIVISOR]]
+; COMMON-NEXT:    store i32 [[TMP9]], ptr [[DST]], align 4, !alias.scope [[META19]], !noalias [[META21]]
+; COMMON-NEXT:    br label %[[PRED_STORE_CONTINUE13]]
+; COMMON:       [[PRED_STORE_CONTINUE13]]:
+; COMMON-NEXT:    [[TMP10:%.*]] = extractelement <4 x i1> [[TMP3]], i32 3
+; COMMON-NEXT:    br i1 [[TMP10]], label %[[PRED_STORE_IF14:.*]], label %[[PRED_STORE_CONTINUE15:.*]]
+; COMMON:       [[PRED_STORE_IF14]]:
+; COMMON-NEXT:    [[TMP11:%.*]] = srem i32 1, [[DIVISOR]]
+; COMMON-NEXT:    store i32 [[TMP11]], ptr [[DST]], align 4, !alias.scope [[META19]], !noalias [[META21]]
+; COMMON-NEXT:    br label %[[PRED_STORE_CONTINUE15]]
+; COMMON:       [[PRED_STORE_CONTINUE15]]:
+; COMMON-NEXT:    [[TMP12:%.*]] = extractelement <4 x i1> [[TMP2]], i32 0
+; COMMON-NEXT:    br i1 [[TMP12]], label %[[PRED_STORE_IF16:.*]], label %[[PRED_STORE_CONTINUE17:.*]]
+; COMMON:       [[PRED_STORE_IF16]]:
+; COMMON-NEXT:    store i32 0, ptr [[KEYINFO]], align 4, !alias.scope [[META23:![0-9]+]], !noalias [[META16]]
+; COMMON-NEXT:    br label %[[PRED_STORE_CONTINUE17]]
+; COMMON:       [[PRED_STORE_CONTINUE17]]:
+; COMMON-NEXT:    [[TMP13:%.*]] = extractelement <4 x i1> [[TMP2]], i32 1
+; COMMON-NEXT:    br i1 [[TMP13]], label %[[PRED_STORE_IF18:.*]], label %[[PRED_STORE_CONTINUE19:.*]]
+; COMMON:       [[PRED_STORE_IF18]]:
+; COMMON-NEXT:    store i32 0, ptr [[KEYINFO]], align 4, !alias.scope [[META23]], !noalias [[META16]]
+; COMMON-NEXT:    br label %[[PRED_STORE_CONTINUE19]]
+; COMMON:       [[PRED_STORE_CONTINUE19]]:
+; COMMON-NEXT:    [[TMP14:%.*]] = extractelement <4 x i1> [[TMP2]], i32 2
+; COMMON-NEXT:    br i1 [[TMP14]], label %[[PRED_STORE_IF20:.*]], label %[[PRED_STORE_CONTINUE21:.*]]
+; COMMON:       [[PRED_STORE_IF20]]:
+; COMMON-NEXT:    store i32 0, ptr [[KEYINFO]], align 4, !alias.scope [[META23]], !noalias [[META16]]
+; COMMON-NEXT:    br label %[[PRED_STORE_CONTINUE21]]
+; COMMON:       [[PRED_STORE_CONTINUE21]]:
+; COMMON-NEXT:    [[TMP15:%.*]] = extractelement <4 x i1> [[TMP2]], i32 3
+; COMMON-NEXT:    br i1 [[TMP15]], label %[[PRED_STORE_IF22:.*]], label %[[PRED_STORE_CONTINUE23]]
+; COMMON:       [[PRED_STORE_IF22]]:
+; COMMON-NEXT:    store i32 0, ptr [[KEYINFO]], align 4, !alias.scope [[META23]], !noalias [[META16]]
+; COMMON-NEXT:    br label %[[PRED_STORE_CONTINUE23]]
+; COMMON:       [[PRED_STORE_CONTINUE23]]:
+; COMMON-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; COMMON-NEXT:    [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32
+; COMMON-NEXT:    br i1 [[TMP16]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
+; COMMON:       [[MIDDLE_BLOCK]]:
+; COMMON-NEXT:    br label %[[SCALAR_PH]]
+; COMMON:       [[SCALAR_PH]]:
+;
+entry:
+  br label %loop
+
+loop:                                             ; preds = %loop.latch, %entry
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
+  %loaded = load i32, ptr %invariant.ptr, align 4
+  %frozen = freeze i32 %loaded
+  %cmp = icmp eq i32 %frozen, 0
+  br i1 %cmp, label %if.zero, label %if.nonzero
+
+if.zero:                                          ; preds = %loop
+  store i32 0, ptr %keyinfo, align 4
+  br label %loop.latch
+
+if.nonzero:                                       ; preds = %loop
+  %rem = srem i32 1, %divisor
+  store i32 %rem, ptr %dst, align 4
+  br label %loop.latch
+
+loop.latch:                                       ; preds = %if.nonzero, %if.zero
+  %iv.next = add i64 %iv, 1
+  %exitcond = icmp eq i64 %iv, 32
+  br i1 %exitcond, label %exit, label %loop
+
+exit:                                             ; preds = %loop.latch
+  ret void
+}
+
 attributes #0 = { "target-features"="+neon,+sve" vscale_range(1,16) }
 
 declare void @llvm.assume(i1 noundef)

david-arm

LGTM, although I still think there isn't much value in trying to maintain the legacy/vplan cost model assert when the user is explicitly trying to assign arbitrary costs to things.

fhahn · 2025-11-21T10:21:23Z

LGTM, although I still think there isn't much value in trying to maintain the legacy/vplan cost model assert when the user is explicitly trying to assign arbitrary costs to things.

Thanks, it should help to keep the cases where we apply the forced costs consistent with where we apply VPlan-based costs without forced costs.

github-actions · 2025-11-21T10:47:25Z

🐧 Linux x64 Test Results

166214 tests passed
2849 tests skipped
1 test failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll

Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -p loop-vectorize -force-target-instruction-cost=1 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck --check-prefixes=COMMON,COST1 /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -p loop-vectorize -force-target-instruction-cost=1 -S /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck --check-prefixes=COMMON,COST1 /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
# .---command stderr------------
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll:389:16: error: COMMON-NEXT: expected string not found in input
# | ; COMMON-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 4
# |                ^
# | <stdin>:323:6: note: scanning from here
# | loop: ; preds = %loop.latch, %entry
# |      ^
# | <stdin>:323:6: note: with "DST" equal to "%dst"
# | loop: ; preds = %loop.latch, %entry
# |      ^
# | <stdin>:336:2: note: possible intended match here
# |  store i32 %rem, ptr %dst, align 4
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |           318:  
# |           319: define void @loop_with_freeze_and_conditional_srem(ptr %dst, ptr %keyinfo, ptr %invariant.ptr, i32 %divisor) { 
# |           320: entry: 
# |           321:  br label %loop 
# |           322:  
# |           323: loop: ; preds = %loop.latch, %entry 
# | next:389'0          X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# | next:389'1                                          with "DST" equal to "%dst"
# |           324:  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ] 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           325:  %loaded = load i32, ptr %invariant.ptr, align 4 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           326:  %frozen = freeze i32 %loaded 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           327:  %cmp = icmp eq i32 %frozen, 0 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           328:  br i1 %cmp, label %if.zero, label %if.nonzero 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# |           331:  store i32 0, ptr %keyinfo, align 4 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           332:  br label %loop.latch 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~
# |           333:  
# | next:389'0     ~
# |           334: if.nonzero: ; preds = %loop 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           335:  %rem = srem i32 1, %divisor 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           336:  store i32 %rem, ptr %dst, align 4 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:389'2      ?                                  possible intended match
# |           337:  br label %loop.latch 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~
# |           338:  
# | next:389'0     ~
# |           339: loop.latch: ; preds = %if.nonzero, %if.zero 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           340:  %iv.next = add i64 %iv, 1 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           341:  %exitcond = icmp eq i64 %iv, 32 
# | next:389'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

…nderlying-instr

…values. (#168372) Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost. This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs. PR: llvm/llvm-project#168372

…68372) Only apply forced instruction costs to recipes with underlying values to match the legacy cost model. A VPlan may have a number of additional VPInstructions without underlying values that are not considered for its cost, and assigning forced costs to them would incorrectly inflate its cost. This fixes a cost divergence between legacy and VPlan-based cost models with forced instruction costs. PR: llvm/llvm-project#168372 Signed-off-by: Hafidz Muzakky <[email protected]>

fhahn requested review from ayalz, david-arm, lukel97 and rengolin November 17, 2025 14:09

llvmbot added vectorizers llvm:transforms labels Nov 17, 2025

fhahn mentioned this pull request Nov 17, 2025

[LV] Don't trigger legacy/vplan assert when forcing costs #156870

Open

david-arm approved these changes Nov 17, 2025

View reviewed changes

Merge branch 'main' into vplan-cost-force-no-underlying-instr

4006378

fhahn enabled auto-merge (squash) November 21, 2025 10:21

Merge remote-tracking branch 'origin/main' into vplan-cost-force-no-u…

2056caf

…nderlying-instr

fhahn merged commit 31711c9 into llvm:main Nov 21, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VPlan] Only apply forced cost to recipes with underlying values. #168372

[VPlan] Only apply forced cost to recipes with underlying values. #168372

Uh oh!

fhahn commented Nov 17, 2025

Uh oh!

llvmbot commented Nov 17, 2025 •

edited

Loading

Uh oh!

david-arm left a comment

Uh oh!

fhahn commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[VPlan] Only apply forced cost to recipes with underlying values. #168372

[VPlan] Only apply forced cost to recipes with underlying values. #168372

Uh oh!

Conversation

fhahn commented Nov 17, 2025

Uh oh!

llvmbot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

🐧 Linux x64 Test Results

Failed Tests

LLVM

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Nov 17, 2025 •

edited

Loading