[VPlan] Materialize VF and VFxUF using VPInstructions. #152879

fhahn · 2025-08-09T20:55:56Z

Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused computations.

Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations.

llvmbot · 2025-08-09T20:56:26Z

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Materialize VF and VFxUF computation using VPInstruction
instead of directly creating IR.

This is one of the last few steps needed to model the full vector skeleton in VPlan.

This is mostly NFC, although in some cases we remove some unused computations.

Patch is 39.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152879.diff

20 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (-16)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+2-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+11)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+44)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll (-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+5-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll (+5-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-safe-dep-distance.ll (-1)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll (+10-18)
(modified) llvm/test/Transforms/LoopVectorize/scalable-assume.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/scalable-iv-outside-user.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/vectorize-force-tail-with-evl.ll (+2-3)
(modified) llvm/test/Transforms/LoopVectorize/vplan-predicate-switch.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/vplan-printing-before-execute.ll (-1)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 0591c224424ed..b28044bde4f20 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7304,6 +7304,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::materializeVectorTripCount(
       BestVPlan, VectorPH, CM.foldTailByMasking(),
       CM.requiresScalarEpilogue(BestVF.isVector()));
+  VPlanTransforms::materializeVFAndVFxUF(BestVPlan, VectorPH, BestVF);
 
   // Perform the actual loop transformation.
   VPTransformState State(&TTI, BestVF, LI, DT, ILV.AC, ILV.Builder, &BestVPlan,
@@ -7360,7 +7361,6 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   //===------------------------------------------------===//
 
   // 2. Copy and widen instructions from the old loop into the new loop.
-  BestVPlan.prepareToExecute(State);
   replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);
 
   // Move check blocks to their final position.
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index a820b524eb75d..74ad6579c9b04 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -951,22 +951,6 @@ VPlan::~VPlan() {
     delete BackedgeTakenCount;
 }
 
-void VPlan::prepareToExecute(VPTransformState &State) {
-  IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
-  Type *TCTy = VPTypeAnalysis(*this).inferScalarType(getTripCount());
-  // FIXME: Model VF * UF computation completely in VPlan.
-  unsigned UF = getUF();
-  if (VF.getNumUsers()) {
-    Value *RuntimeVF = getRuntimeVF(Builder, TCTy, State.VF);
-    VF.setUnderlyingValue(RuntimeVF);
-    VFxUF.setUnderlyingValue(
-        UF > 1 ? Builder.CreateMul(RuntimeVF, ConstantInt::get(TCTy, UF))
-               : RuntimeVF);
-  } else {
-    VFxUF.setUnderlyingValue(createStepForVF(Builder, TCTy, State.VF, UF));
-  }
-}
-
 VPIRBasicBlock *VPlan::getExitBlock(BasicBlock *IRBB) const {
   auto Iter = find_if(getExitBlocks(), [IRBB](const VPIRBasicBlock *VPIRBB) {
     return VPIRBB->getIRBasicBlock() == IRBB;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 6f0983510d0a1..b8059778e406f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1019,6 +1019,7 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
     /// The lane specifies an index into a vector formed by combining all vector
     /// operands (all operands after the first one).
     ExtractLane,
+    VScale,
 
   };
 
@@ -1167,6 +1168,7 @@ class VPInstructionWithType : public VPInstruction {
     switch (VPI->getOpcode()) {
     case VPInstruction::WideIVStep:
     case VPInstruction::StepVector:
+    case VPInstruction::VScale:
       return true;
     default:
       return false;
@@ -3968,9 +3970,6 @@ class VPlan {
     VPBB->setPlan(this);
   }
 
-  /// Prepare the plan for execution, setting up the required live-in values.
-  void prepareToExecute(VPTransformState &State);
-
   /// Generate the IR code for this VPlan.
   void execute(VPTransformState *State);
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e971ba1aac15c..1b9685c89a64e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -452,6 +452,7 @@ unsigned VPInstruction::getNumOperandsForOpcode(unsigned Opcode) {
 
   switch (Opcode) {
   case VPInstruction::StepVector:
+  case VPInstruction::VScale:
     return 0;
   case Instruction::Alloca:
   case Instruction::ExtractValue:
@@ -1027,6 +1028,7 @@ bool VPInstruction::isSingleScalar() const {
   switch (getOpcode()) {
   case Instruction::PHI:
   case VPInstruction::ExplicitVectorLength:
+  case VPInstruction::VScale:
     return true;
   default:
     return isScalarCast();
@@ -1281,6 +1283,12 @@ void VPInstructionWithType::execute(VPTransformState &State) {
     State.set(this, StepVector);
     break;
   }
+  case VPInstruction::VScale: {
+    Value *VScale = State.Builder.CreateVScale(ResultTy);
+    State.set(this, VScale, true);
+    break;
+  }
+
   default:
     llvm_unreachable("opcode not implemented yet");
   }
@@ -1301,6 +1309,9 @@ void VPInstructionWithType::print(raw_ostream &O, const Twine &Indent,
   case VPInstruction::StepVector:
     O << "step-vector " << *ResultTy;
     break;
+  case VPInstruction::VScale:
+    O << "vscale " << *ResultTy;
+    break;
   default:
     assert(Instruction::isCast(getOpcode()) && "unhandled opcode");
     O << Instruction::getOpcodeName(getOpcode()) << " ";
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 34b2abf449ece..a2e1879b88c96 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -3339,6 +3339,50 @@ void VPlanTransforms::materializeVectorTripCount(VPlan &Plan,
   VectorTC.replaceAllUsesWith(Res);
 }
 
+void VPlanTransforms::materializeVFAndVFxUF(VPlan &Plan, VPBasicBlock *VectorPH,
+                                            ElementCount VFEC) {
+  VPBuilder Builder(VectorPH, VectorPH->begin());
+  auto *TCTy = VPTypeAnalysis(Plan).inferScalarType(Plan.getTripCount());
+  VPValue &VF = Plan.getVF();
+  VPValue &VFxUF = Plan.getVFxUF();
+  if (VF.getNumUsers()) {
+    VPValue *RuntimeVF =
+        Plan.getOrAddLiveIn(ConstantInt::get(TCTy, VFEC.getKnownMinValue()));
+    if (VFEC.isScalable())
+      RuntimeVF = Builder.createNaryOp(
+          Instruction::Mul,
+          {Builder.createNaryOp(VPInstruction::VScale, {}, TCTy), RuntimeVF},
+          VPIRFlags::WrapFlagsTy(true, false));
+    if (any_of(VF.users(), [&VF](VPUser *U) { return !U->usesScalars(&VF); })) {
+      auto *BC = Builder.createNaryOp(VPInstruction::Broadcast, {RuntimeVF});
+      VF.replaceUsesWithIf(
+          BC, [&VF](VPUser &U, unsigned) { return !U.usesScalars(&VF); });
+    }
+    VF.replaceAllUsesWith(RuntimeVF);
+
+    VPValue *UF = Plan.getOrAddLiveIn(ConstantInt::get(TCTy, Plan.getUF()));
+    auto *MulByUF = Plan.getUF() == 1 ? RuntimeVF
+                                      : Builder.createNaryOp(Instruction::Mul,
+                                                             {RuntimeVF, UF});
+    VFxUF.replaceAllUsesWith(MulByUF);
+    return;
+  }
+
+  unsigned VFMulUF = VFEC.getKnownMinValue() * Plan.getUF();
+  VPValue *RuntimeVFxUF = Plan.getOrAddLiveIn(ConstantInt::get(TCTy, VFMulUF));
+  if (VFEC.isScalable()) {
+    RuntimeVFxUF =
+        VFMulUF == 1
+            ? RuntimeVFxUF
+            : Builder.createNaryOp(
+                  Instruction::Mul,
+                  {Builder.createNaryOp(VPInstruction::VScale, {}, TCTy),
+                   RuntimeVFxUF},
+                  VPIRFlags::WrapFlagsTy(true, false));
+  }
+  VFxUF.replaceAllUsesWith(RuntimeVFxUF);
+}
+
 /// Returns true if \p V is VPWidenLoadRecipe or VPInterleaveRecipe that can be
 /// converted to a narrower recipe. \p V is used by a wide recipe that feeds a
 /// store interleave group at index \p Idx, \p WideMember0 is the recipe feeding
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index e49137cbaada3..d3e4454d995ce 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -276,6 +276,10 @@ struct VPlanTransforms {
   static void materializeBackedgeTakenCount(VPlan &Plan,
                                             VPBasicBlock *VectorPH);
 
+  /// Materialize VF and VFxUF to be computed explicitly using VPInstructions.
+  static void materializeVFAndVFxUF(VPlan &Plan, VPBasicBlock *VectorPH,
+                                    ElementCount VF);
+
   /// Try to convert a plan with interleave groups with VF elements to a plan
   /// with the interleave groups replaced by wide loads and stores processing VF
   /// elements, if all transformed interleave groups access the full vector
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
index 351da8b6145be..dd9d3a1fae223 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
@@ -14,12 +14,12 @@ define void @induction_i7(ptr %dst) #0 {
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP40:%.*]] = mul nuw i64 [[TMP4]], 2
+; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP40]], i64 0
+; CHECK-NEXT:    [[DOTSPLAT_:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP40]], 2
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 64, [[TMP5]]
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]]
 ; CHECK-NEXT:    [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i7
-; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP40]], i64 0
-; CHECK-NEXT:    [[DOTSPLAT_:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = trunc <vscale x 2 x i64> [[DOTSPLAT_]] to <vscale x 2 x i7>
 ; CHECK-NEXT:    [[TMP6:%.*]] = call <vscale x 2 x i8> @llvm.stepvector.nxv2i8()
 ; CHECK-NEXT:    [[TMP7:%.*]] = trunc <vscale x 2 x i8> [[TMP6]] to <vscale x 2 x i7> 
@@ -76,12 +76,12 @@ define void @induction_i3_zext(ptr %dst) #0 {
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP40:%.*]] = mul nuw i64 [[TMP4]], 2
+; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP40]], i64 0
+; CHECK-NEXT:    [[DOTSPLAT_:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP40]], 2
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 64, [[TMP5]]
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]]
 ; CHECK-NEXT:    [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i3
-; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP40]], i64 0
-; CHECK-NEXT:    [[DOTSPLAT_:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = trunc <vscale x 2 x i64> [[DOTSPLAT_]] to <vscale x 2 x i3>
 ; CHECK-NEXT:    [[TMP6:%.*]] = call <vscale x 2 x i8> @llvm.stepvector.nxv2i8()
 ; CHECK-NEXT:    [[TMP7:%.*]] = trunc <vscale x 2 x i8> [[TMP6]] to <vscale x 2 x i3>
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll b/llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll
index 4ed9580bcbe25..e5900c22e8680 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll
@@ -72,7 +72,6 @@ define i32 @print_partial_reduction(ptr %a, ptr %b) {
 ; CHECK-NEXT: No successors
 ; CHECK-NEXT: }
 ; CHECK: VPlan 'Final VPlan for VF={8,16},UF={1}' {
-; CHECK-NEXT: Live-in ir<[[EP_VFxUF:.+]]> = VF * UF
 ; CHECK-NEXT: Live-in ir<1024> = original trip-count
 ; CHECK-EMPTY:
 ; CHECK-NEXT: ir-bb<entry>:
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll b/llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll
index aca00a9fa808a..0314cf99effdb 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll
@@ -12,9 +12,6 @@ define i64 @pr97452_scalable_vf1_for(ptr %src, ptr noalias %dst) #0 {
 ; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 23, [[TMP0]]
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 23, [[TMP1]]
-; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 23, [[N_MOD_VF]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @llvm.vscale.i32()
 ; CHECK-NEXT:    [[TMP4:%.*]] = sub i32 [[TMP3]], 1
 ; CHECK-NEXT:    [[VECTOR_RECUR_INIT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 0, i32 [[TMP4]]
@@ -27,9 +24,9 @@ define i64 @pr97452_scalable_vf1_for(ptr %src, ptr noalias %dst) #0 {
 ; CHECK-NEXT:    [[TMP7:%.*]] = call <vscale x 1 x i64> @llvm.vector.splice.nxv1i64(<vscale x 1 x i64> [[VECTOR_RECUR]], <vscale x 1 x i64> [[WIDE_LOAD]], i32 -1)
 ; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[DST]], i64 [[INDEX]]
 ; CHECK-NEXT:    store <vscale x 1 x i64> [[TMP7]], ptr [[TMP8]], align 8
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP1]]
-; CHECK-NEXT:    [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT:    br i1 [[TMP10]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
+; CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 23
+; CHECK-NEXT:    br i1 [[TMP6]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[TMP11:%.*]] = call i32 @llvm.vscale.i32()
 ; CHECK-NEXT:    [[TMP12:%.*]] = sub i32 [[TMP11]], 1
@@ -37,11 +34,10 @@ define i64 @pr97452_scalable_vf1_for(ptr %src, ptr noalias %dst) #0 {
 ; CHECK-NEXT:    [[TMP14:%.*]] = call i32 @llvm.vscale.i32()
 ; CHECK-NEXT:    [[TMP15:%.*]] = sub i32 [[TMP14]], 1
 ; CHECK-NEXT:    [[TMP16:%.*]] = extractelement <vscale x 1 x i64> [[TMP7]], i32 [[TMP15]]
-; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 23, [[N_VEC]]
-; CHECK-NEXT:    br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 23, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[FOR:%.*]] = phi i64 [ [[SCALAR_RECUR_INIT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll
index 80463945f0654..4b65ef091adfb 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll
@@ -1012,7 +1012,6 @@ define void @load_store_factor5(ptr %p) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -1155,7 +1154,6 @@ define void @load_store_factor5(ptr %p) {
 ; SCALABLE-NEXT:  entry:
 ; SCALABLE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; SCALABLE:       vector.ph:
-; SCALABLE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; SCALABLE-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; SCALABLE:       vector.body:
 ; SCALABLE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -1273,7 +1271,6 @@ define void @load_store_factor6(ptr %p) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -1431,7 +1428,6 @@ define void @load_store_factor6(ptr %p) {
 ; SCALABLE-NEXT:  entry:
 ; SCALABLE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; SCALABLE:       vector.ph:
-; SCALABLE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; SCALABLE-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; SCALABLE:       vector.body:
 ; SCALABLE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -1562,7 +1558,6 @@ define void @load_store_factor7(ptr %p) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -1736,7 +1731,6 @@ define void @load_store_factor7(ptr %p) {
 ; SCALABLE-NEXT:  entry:
 ; SCALABLE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; SCALABLE:       vector.ph:
-; SCALABLE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; SCALABLE-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; SCALABLE:       vector.body:
 ; SCALABLE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -1880,7 +1874,6 @@ define void @load_store_factor8(ptr %p) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -2067,7 +2060,6 @@ define void @load_store_factor8(ptr %p) {
 ; SCALABLE-NEXT:  entry:
 ; SCALABLE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; SCALABLE:       vector.ph:
-; SCALABLE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; SCALABLE-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; SCALABLE:       vector.body:
 ; SCALABLE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll b/llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll
index f731d393fc993..0aa395c9db9ba 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll
@@ -12,9 +12,6 @@ define void @load_store(ptr %p) {
 ; LMUL1-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
 ; LMUL1-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; LMUL1:       vector.ph:
-; LMUL1-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; LMUL1-NEXT:    [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
-; LMUL1-NEXT:    [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
 ; LMUL1-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; LMUL1:       vector.body:
 ; LMUL1-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -22,14 +19,13 @@ define void @load_store(ptr %p) {
 ; LMUL1-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 1 x i64>, ptr [[TMP3]], align 8
 ; LMUL1-NEXT:    [[TMP5:%.*]] = add <vscale x 1 x i64> [[WIDE_LOAD]], splat (i64 1)
 ; LMUL1-NEXT:    store <vscale x 1 x i64> [[TMP5]], ptr [[TMP3]], align 8
-; LMUL1-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP1]]
-; LMUL1-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; LMUL1-NEXT:    br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; LMUL1-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
+; LMUL1-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
+; LMUL1-NEXT:    br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; LMUL1:       middle.block:
-; LMUL1-NEXT:    [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
-; LMUL1-NEXT: ...
[truncated]

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/vectorize-force-tail-with-evl.ll

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

…and-vfxuf

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll

llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll

…and-vfxuf

lukel97

LGTM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

david-arm · 2025-08-11T11:49:31Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    return;
+  }
+
+  VPValue *RuntimeVFxUF = Builder.createElementCount(TCTy, VFEC * Plan.getUF());


Is this guaranteed to always have users? I just wondered why we don't check that here.

Not in all cases, e.g. if the canonical IV and its increment can be completely be removed. If it won't get used, VPlan DCE can remove it. The reason we check for users of VF is that we compute things in slightly different order if we need to provide a value for the runtime VF

david-arm · 2025-08-11T11:52:36Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      VF.replaceUsesWithIf(
+          BC, [&VF](VPUser &U, unsigned) { return !U.usesScalars(&VF); });
+    }
+    VF.replaceAllUsesWith(RuntimeVF);


How does this work, given you're undoing the VF.replaceUsesWithIf call above?

Vector uses will be replaced above with the broadcast, the remaining users get the scalar value.

Oh I see now, sorry! By the time we get to the second line the uses of VF that were replaced before are no longer present in the use list.

david-arm · 2025-08-11T11:54:28Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    VF.replaceAllUsesWith(RuntimeVF);
+
+    VPValue *UF = Plan.getOrAddLiveIn(ConstantInt::get(TCTy, Plan.getUF()));
+    auto *MulByUF = Plan.getUF() == 1 ? RuntimeVF


Can you not just fall through to the bottom of the function - it looks like we're just duplicating code here.

I'm not sure how, we if VF is used we first multiply VF * vscale and then multiply that by UF, wherease below we compute (VF * UF) * vscale

Perhaps I'm missing something, but it looked like these two pieces of code were calculating the same thing:

VPValue *UF = Plan.getOrAddLiveIn(ConstantInt::get(TCTy, Plan.getUF())); VPValue *MulByUF = Plan.getUF() == 1 ? RuntimeVF : Builder.createNaryOp(Instruction::Mul, {RuntimeVF, UF}); }

and

VPValue *RuntimeVFxUF = Builder.createElementCount(TCTy, VFEC * Plan.getUF());

The only difference seems to be that you're not using createOverflowingOp when multiplying the UF, whereas Builder.createElementCount(TCTy, VFEC * Plan.getUF()) does.

Hm, but they generate operations in slightly different order, the first does (VF * Vscale) * UF, whereas otherwise we do ((VF * UF) * vscale), the former will serve users of the runtime-vf.

Both compute the same value, but in different order. The patch keeps the sequence the same as for IR.

OK, but I'm a bit surprised that the order of multiplication is crucial for correctness here. At the moment it just looks like the code is deliberately trying hard to maintain the order of multiplication whereas in reality if the order matters then something is horribly broken, right? I would expect the vectoriser to always choose VF,UF pairs such that (VF * VScale) * UF == (VF * UF) * VScale. If not, then I believe that we need to fix the vectoriser to ensure this cannot happen.

It’s not needed for correctness, as mentioned above they both compute the same value.

but if there are no users of the runtime vf, the current code constant folds the UF*VF.getKnownMinValue() computation, which is not possibly if there are users for VF.getKnownMinValue()*vscale.

one could argue that we could always create VFvscaleUF, and rely on recipe simplifications to fold the VFUF computations if there no users of VFvscale.

This would certainly be possible once we have the computation explicit in VPlan, but the current patch just tries to port the existing computation, to avoid unrelated test changes.

OK fair enough, thanks for the explanation. I do get the argument that you want to minimise test changes and I'm happy with that. I just wanted to understand the rationale behind it that's all and make sure there isn't a hidden bug or assumption here.

…and-vfxuf

ayalz

Post approval review.

ayalz · 2025-08-12T05:00:44Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  Type *TCTy = VPTypeAnalysis(Plan).inferScalarType(Plan.getTripCount());
+  VPValue &VF = Plan.getVF();
+  VPValue &VFxUF = Plan.getVFxUF();
+  if (VF.getNumUsers()) {


Deal with the simpler complementary case first? Add comment why needed - distinct orders explained below. Can also guard handling VFxUF if it has users.

Added comments, thanks! I've not added a use check for now, as this will easily be cleaned up by VPlan dce

ayalz · 2025-08-12T05:09:23Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    return;
+  }
+
+  VPValue *RuntimeVFxUF = Builder.createElementCount(TCTy, VFEC * Plan.getUF());


No need for MulByUF in this case?

It's done directly by multiplying (constant folding) VF * Plan.getUF().

ayalz · 2025-08-12T05:12:24Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

+      RuntimeEC = EC.getKnownMinValue() == 1
+                      ? VScale
+                      : createOverflowingOp(Instruction::Mul,
+                                            {VScale, RuntimeEC}, {true, false});


Comment constant parameters?

Added, thanks

ayalz · 2025-08-12T05:15:11Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

+    VPlan &Plan = *getInsertBlock()->getPlan();
+    VPValue *RuntimeEC =
+        Plan.getOrAddLiveIn(ConstantInt::get(Ty, EC.getKnownMinValue()));
+    if (EC.isScalable()) {


Early exit simpler complementary case first?

updated, thanks

ayalz · 2025-08-12T05:19:35Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

    return tryInsertInstruction(new VPPhi(IncomingValues, DL, Name));
  }

+  VPValue *createElementCount(Type *Ty, ElementCount EC) {


Suggested change

VPValue *createElementCount(Type *Ty, ElementCount EC) {

VPValue *createRuntimeVF(Type *Ty, ElementCount VF) {

Sounds a bit odd to createElementCount given an ElementCount. There's probably a better name than "RuntimeVF", but that's already in use elsewhere, in any case should be clearly defined.

IRBuilder has an identical function also called IRBuilderBase::CreateElementCount, and SelectionDAG too with SelectionDAG::getElementCount. I think we should be consistent here.

It also doesn't necessarily need to generate a VF. An ElementCount could be e.g. VFxUF or another arbitrary quantity.

I left the name aligned with the IR Builder for now.

ayalz · 2025-08-12T05:21:00Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h


+  VPValue *createElementCount(Type *Ty, ElementCount EC) {
+    VPlan &Plan = *getInsertBlock()->getPlan();
+    VPValue *RuntimeEC =


Suggested change

VPValue *RuntimeEC =

VPValue *RuntimeVF =

I left this as RuntimeEC, because we use this for VFxUF as well, where the ElementCount passed in is the original VF multiplied by UF.

ayalz · 2025-08-12T05:22:16Z

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

+    if (EC.isScalable()) {
+      VPValue *VScale = createNaryOp(VPInstruction::VScale, {}, Ty);
+      RuntimeEC = EC.getKnownMinValue() == 1
+                      ? VScale


Early return VScale?

This is now directly returned at the end, I left the ternary operator there for now.

ayalz · 2025-08-12T05:32:11Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  VPValue &VF = Plan.getVF();
+  VPValue &VFxUF = Plan.getVFxUF();


So Plan's getVF() and getVFxUF() become obsolete from this point? They will remain use-less, no longer retrieving the relevant values. Worth noting, or hooking them to their replacements?

Yep, for now, there are no users of getVF and getVFxUF at after this point. It would be good to mark them as such, but I'm not sure what the best way would be. Another alternative would be manage them as actual users. They should also be region-specific. I'll prepare some follow-ups for that.

…and-vfxuf

…52879) Materialize VF and VFxUF computation using VPInstruction instead of directly creating IR. This is one of the last few steps needed to model the full vector skeleton in VPlan. This is mostly NFC, although in some cases we remove some unused computations. PR: llvm/llvm-project#152879

fhahn requested review from aniragil, ayalz, david-arm and lukel97 August 9, 2025 20:55

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Aug 9, 2025

lukel97 reviewed Aug 11, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/vectorize-force-tail-with-evl.ll Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

fhahn added 2 commits August 11, 2025 10:57

Merge remote-tracking branch 'origin/main' into vplan-materialize-vf-…

3367a1f

…and-vfxuf

!fixup after merge, address comments, thanks

e26258b

lukel97 reviewed Aug 11, 2025

View reviewed changes

fhahn added 2 commits August 11, 2025 11:56

Merge remote-tracking branch 'origin/main' into vplan-materialize-vf-…

929db2e

…and-vfxuf

!fixup address comments, thanks

4aa2925

lukel97 approved these changes Aug 11, 2025

View reviewed changes

lukel97 reviewed Aug 11, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

david-arm reviewed Aug 11, 2025

View reviewed changes

fhahn added 2 commits August 11, 2025 14:30

Merge remote-tracking branch 'origin/main' into vplan-materialize-vf-…

f03fdfb

…and-vfxuf

Merge remote-tracking branch 'origin/main' into vplan-materialize-vf-…

1cc3267

…and-vfxuf

ayalz reviewed Aug 12, 2025

View reviewed changes

fhahn added 2 commits August 12, 2025 10:56

Merge remote-tracking branch 'origin/main' into vplan-materialize-vf-…

a404171

…and-vfxuf

!fixup address comments, thanks

dab0d23

fhahn merged commit 4242589 into llvm:main Aug 12, 2025
9 checks passed

fhahn deleted the vplan-materialize-vf-and-vfxuf branch August 12, 2025 13:13

	VPValue createElementCount(Type Ty, ElementCount EC) {
	VPValue createRuntimeVF(Type Ty, ElementCount VF) {

		VPValue &VF = Plan.getVF();
		VPValue &VFxUF = Plan.getVFxUF();

[VPlan] Materialize VF and VFxUF using VPInstructions. #152879

[VPlan] Materialize VF and VFxUF using VPInstructions. #152879

Conversation

fhahn commented Aug 9, 2025

Uh oh!

llvmbot commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayalz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

llvmbot commented Aug 9, 2025 •

edited

Loading