[VPlan] Add narrowToSingleScalarRecipe transform. #139150

fhahn · 2025-05-08T20:21:22Z

Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes.

There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups.

By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis.

Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (#125365)

Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (llvm#125365)

llvmbot · 2025-05-08T20:22:03Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes.

There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups.

By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis.

Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (#125365)

Full diff: https://github.com/llvm/llvm-project/pull/139150.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+35)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll (+17-16)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-4)
(modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+4-7)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 79ddb8bf0b09b..50552c843cd59 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1084,6 +1084,40 @@ void VPlanTransforms::simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy) {
   }
 }
 
+static void convertToUniformRecipes(VPlan &Plan) {
+  auto TryToNarrow = [](VPBasicBlock *VPBB) {
+    for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+      // Try to narrow wide and replicating recipes to uniform recipes, based on
+      // VPlan analysis.
+      auto *Def = dyn_cast<VPSingleDefRecipe>(&R);
+      if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||
+          !Def->getUnderlyingValue())
+        continue;
+
+      auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
+      if (RepR && RepR->isUniform())
+        continue;
+
+      // Skip recipes that aren't uniform and don't have only their scalar
+      // results used. In the later case, we would introduce extra broadcasts.
+      if (!vputils::isUniformAfterVectorization(Def) ||
+          any_of(Def->users(),
+                 [Def](VPUser *U) { return !U->usesScalars(Def); }))
+        continue;
+
+      auto *Clone = new VPReplicateRecipe(Def->getUnderlyingInstr(),
+                                          Def->operands(), /*IsUniform*/ true);
+      Clone->insertBefore(Def);
+      Def->replaceAllUsesWith(Clone);
+      Def->eraseFromParent();
+    }
+  };
+
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())))
+    TryToNarrow(VPBB);
+}
+
 /// Normalize and simplify VPBlendRecipes. Should be run after simplifyRecipes
 /// to make sure the masks are simplified.
 static void simplifyBlends(VPlan &Plan) {
@@ -1778,6 +1812,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
   runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
   runPass(simplifyBlends, Plan);
   runPass(removeDeadRecipes, Plan);
+  runPass(convertToUniformRecipes, Plan);
   runPass(legalizeAndOptimizeInductions, Plan);
   runPass(removeRedundantExpandSCEVRecipes, Plan);
   runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
index 02a876a3fda67..bb96c166f894c 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -7,86 +7,87 @@ define void @test(ptr %p, i40 %a) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp sgt i1 true, false
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
 ; CHECK:       pred.store.if:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE]]
 ; CHECK:       pred.store.continue:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
 ; CHECK:       pred.store.if1:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE2]]
 ; CHECK:       pred.store.continue2:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
 ; CHECK:       pred.store.if3:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE4]]
 ; CHECK:       pred.store.continue4:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]]
 ; CHECK:       pred.store.if5:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE6]]
 ; CHECK:       pred.store.continue6:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]]
 ; CHECK:       pred.store.if7:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE8]]
 ; CHECK:       pred.store.continue8:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]]
 ; CHECK:       pred.store.if9:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE10]]
 ; CHECK:       pred.store.continue10:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12:%.*]]
 ; CHECK:       pred.store.if11:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE12]]
 ; CHECK:       pred.store.continue12:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF13:%.*]], label [[PRED_STORE_CONTINUE14:%.*]]
 ; CHECK:       pred.store.if13:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE14]]
 ; CHECK:       pred.store.continue14:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF15:%.*]], label [[PRED_STORE_CONTINUE16:%.*]]
 ; CHECK:       pred.store.if15:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE16]]
 ; CHECK:       pred.store.continue16:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF17:%.*]], label [[PRED_STORE_CONTINUE18:%.*]]
 ; CHECK:       pred.store.if17:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE18]]
 ; CHECK:       pred.store.continue18:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF19:%.*]], label [[PRED_STORE_CONTINUE20:%.*]]
 ; CHECK:       pred.store.if19:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE20]]
 ; CHECK:       pred.store.continue20:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF21:%.*]], label [[PRED_STORE_CONTINUE22:%.*]]
 ; CHECK:       pred.store.if21:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE22]]
 ; CHECK:       pred.store.continue22:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF23:%.*]], label [[PRED_STORE_CONTINUE24:%.*]]
 ; CHECK:       pred.store.if23:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE24]]
 ; CHECK:       pred.store.continue24:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF25:%.*]], label [[PRED_STORE_CONTINUE26:%.*]]
 ; CHECK:       pred.store.if25:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE26]]
 ; CHECK:       pred.store.continue26:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF27:%.*]], label [[PRED_STORE_CONTINUE28:%.*]]
 ; CHECK:       pred.store.if27:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE28]]
 ; CHECK:       pred.store.continue28:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF29:%.*]], label [[PRED_STORE_CONTINUE30:%.*]]
 ; CHECK:       pred.store.if29:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE30]]
 ; CHECK:       pred.store.continue30:
 ; CHECK-NEXT:    br label [[MIDDLE_BLOCK:%.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
index f8b1cc2d775f5..7c42c3d9cd52e 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
@@ -890,9 +890,7 @@ define i64 @cost_assume(ptr %end, i64 %N) {
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[N:%.*]], i64 0
-; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[TMP3:%.*]] = icmp ne <2 x i64> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT:    [[TMP11:%.*]] = icmp ne i64 [[N:%.*]], 0
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -904,7 +902,6 @@ define i64 @cost_assume(ptr %end, i64 %N) {
 ; CHECK-NEXT:    [[TMP8]] = add <2 x i64> [[VEC_PHI2]], splat (i64 1)
 ; CHECK-NEXT:    [[TMP9]] = add <2 x i64> [[VEC_PHI3]], splat (i64 1)
 ; CHECK-NEXT:    [[TMP10]] = add <2 x i64> [[VEC_PHI4]], splat (i64 1)
-; CHECK-NEXT:    [[TMP11:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
diff --git a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
index fb84739881010..30e0acb4d7bf6 100644
--- a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
+++ b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
@@ -159,9 +159,6 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
 ; CHECK-NEXT:    [[IDENT_CHECK:%.*]] = icmp ne i32 [[SCALE]], 1
 ; CHECK-NEXT:    br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT:    [[TMP81:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT:    [[TMP82:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
 ; CHECK-NEXT:    [[TMP83:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
@@ -174,10 +171,10 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
 ; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP12]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP14]]
 ; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP16]]
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP11]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP13]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP15]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP17]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP11]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP13]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP15]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP17]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; CHECK-NEXT:    br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]

llvmbot · 2025-05-08T20:22:04Z

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes.

There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups.

By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis.

Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (#125365)

Full diff: https://github.com/llvm/llvm-project/pull/139150.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+35)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll (+17-16)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-4)
(modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+4-7)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 79ddb8bf0b09b..50552c843cd59 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1084,6 +1084,40 @@ void VPlanTransforms::simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy) {
   }
 }
 
+static void convertToUniformRecipes(VPlan &Plan) {
+  auto TryToNarrow = [](VPBasicBlock *VPBB) {
+    for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+      // Try to narrow wide and replicating recipes to uniform recipes, based on
+      // VPlan analysis.
+      auto *Def = dyn_cast<VPSingleDefRecipe>(&R);
+      if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||
+          !Def->getUnderlyingValue())
+        continue;
+
+      auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
+      if (RepR && RepR->isUniform())
+        continue;
+
+      // Skip recipes that aren't uniform and don't have only their scalar
+      // results used. In the later case, we would introduce extra broadcasts.
+      if (!vputils::isUniformAfterVectorization(Def) ||
+          any_of(Def->users(),
+                 [Def](VPUser *U) { return !U->usesScalars(Def); }))
+        continue;
+
+      auto *Clone = new VPReplicateRecipe(Def->getUnderlyingInstr(),
+                                          Def->operands(), /*IsUniform*/ true);
+      Clone->insertBefore(Def);
+      Def->replaceAllUsesWith(Clone);
+      Def->eraseFromParent();
+    }
+  };
+
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())))
+    TryToNarrow(VPBB);
+}
+
 /// Normalize and simplify VPBlendRecipes. Should be run after simplifyRecipes
 /// to make sure the masks are simplified.
 static void simplifyBlends(VPlan &Plan) {
@@ -1778,6 +1812,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
   runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
   runPass(simplifyBlends, Plan);
   runPass(removeDeadRecipes, Plan);
+  runPass(convertToUniformRecipes, Plan);
   runPass(legalizeAndOptimizeInductions, Plan);
   runPass(removeRedundantExpandSCEVRecipes, Plan);
   runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
index 02a876a3fda67..bb96c166f894c 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -7,86 +7,87 @@ define void @test(ptr %p, i40 %a) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp sgt i1 true, false
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
 ; CHECK:       pred.store.if:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE]]
 ; CHECK:       pred.store.continue:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
 ; CHECK:       pred.store.if1:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE2]]
 ; CHECK:       pred.store.continue2:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
 ; CHECK:       pred.store.if3:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE4]]
 ; CHECK:       pred.store.continue4:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]]
 ; CHECK:       pred.store.if5:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE6]]
 ; CHECK:       pred.store.continue6:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]]
 ; CHECK:       pred.store.if7:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE8]]
 ; CHECK:       pred.store.continue8:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]]
 ; CHECK:       pred.store.if9:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE10]]
 ; CHECK:       pred.store.continue10:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12:%.*]]
 ; CHECK:       pred.store.if11:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE12]]
 ; CHECK:       pred.store.continue12:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF13:%.*]], label [[PRED_STORE_CONTINUE14:%.*]]
 ; CHECK:       pred.store.if13:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE14]]
 ; CHECK:       pred.store.continue14:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF15:%.*]], label [[PRED_STORE_CONTINUE16:%.*]]
 ; CHECK:       pred.store.if15:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE16]]
 ; CHECK:       pred.store.continue16:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF17:%.*]], label [[PRED_STORE_CONTINUE18:%.*]]
 ; CHECK:       pred.store.if17:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE18]]
 ; CHECK:       pred.store.continue18:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF19:%.*]], label [[PRED_STORE_CONTINUE20:%.*]]
 ; CHECK:       pred.store.if19:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE20]]
 ; CHECK:       pred.store.continue20:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF21:%.*]], label [[PRED_STORE_CONTINUE22:%.*]]
 ; CHECK:       pred.store.if21:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE22]]
 ; CHECK:       pred.store.continue22:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF23:%.*]], label [[PRED_STORE_CONTINUE24:%.*]]
 ; CHECK:       pred.store.if23:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE24]]
 ; CHECK:       pred.store.continue24:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF25:%.*]], label [[PRED_STORE_CONTINUE26:%.*]]
 ; CHECK:       pred.store.if25:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE26]]
 ; CHECK:       pred.store.continue26:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF27:%.*]], label [[PRED_STORE_CONTINUE28:%.*]]
 ; CHECK:       pred.store.if27:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE28]]
 ; CHECK:       pred.store.continue28:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF29:%.*]], label [[PRED_STORE_CONTINUE30:%.*]]
 ; CHECK:       pred.store.if29:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE30]]
 ; CHECK:       pred.store.continue30:
 ; CHECK-NEXT:    br label [[MIDDLE_BLOCK:%.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
index f8b1cc2d775f5..7c42c3d9cd52e 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
@@ -890,9 +890,7 @@ define i64 @cost_assume(ptr %end, i64 %N) {
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[N:%.*]], i64 0
-; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[TMP3:%.*]] = icmp ne <2 x i64> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT:    [[TMP11:%.*]] = icmp ne i64 [[N:%.*]], 0
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -904,7 +902,6 @@ define i64 @cost_assume(ptr %end, i64 %N) {
 ; CHECK-NEXT:    [[TMP8]] = add <2 x i64> [[VEC_PHI2]], splat (i64 1)
 ; CHECK-NEXT:    [[TMP9]] = add <2 x i64> [[VEC_PHI3]], splat (i64 1)
 ; CHECK-NEXT:    [[TMP10]] = add <2 x i64> [[VEC_PHI4]], splat (i64 1)
-; CHECK-NEXT:    [[TMP11:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
diff --git a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
index fb84739881010..30e0acb4d7bf6 100644
--- a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
+++ b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
@@ -159,9 +159,6 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
 ; CHECK-NEXT:    [[IDENT_CHECK:%.*]] = icmp ne i32 [[SCALE]], 1
 ; CHECK-NEXT:    br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT:    [[TMP81:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT:    [[TMP82:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
 ; CHECK-NEXT:    [[TMP83:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
@@ -174,10 +171,10 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
 ; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP12]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP14]]
 ; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP16]]
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP11]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP13]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP15]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP17]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP11]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP13]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP15]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP17]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; CHECK-NEXT:    br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]

llvmbot · 2025-05-08T20:22:04Z

@llvm/pr-subscribers-backend-systemz

Author: Florian Hahn (fhahn)

Changes

Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes.

There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups.

By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis.

Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (#125365)

Full diff: https://github.com/llvm/llvm-project/pull/139150.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+35)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll (+17-16)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-4)
(modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+4-7)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 79ddb8bf0b09b..50552c843cd59 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1084,6 +1084,40 @@ void VPlanTransforms::simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy) {
   }
 }
 
+static void convertToUniformRecipes(VPlan &Plan) {
+  auto TryToNarrow = [](VPBasicBlock *VPBB) {
+    for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+      // Try to narrow wide and replicating recipes to uniform recipes, based on
+      // VPlan analysis.
+      auto *Def = dyn_cast<VPSingleDefRecipe>(&R);
+      if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||
+          !Def->getUnderlyingValue())
+        continue;
+
+      auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
+      if (RepR && RepR->isUniform())
+        continue;
+
+      // Skip recipes that aren't uniform and don't have only their scalar
+      // results used. In the later case, we would introduce extra broadcasts.
+      if (!vputils::isUniformAfterVectorization(Def) ||
+          any_of(Def->users(),
+                 [Def](VPUser *U) { return !U->usesScalars(Def); }))
+        continue;
+
+      auto *Clone = new VPReplicateRecipe(Def->getUnderlyingInstr(),
+                                          Def->operands(), /*IsUniform*/ true);
+      Clone->insertBefore(Def);
+      Def->replaceAllUsesWith(Clone);
+      Def->eraseFromParent();
+    }
+  };
+
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())))
+    TryToNarrow(VPBB);
+}
+
 /// Normalize and simplify VPBlendRecipes. Should be run after simplifyRecipes
 /// to make sure the masks are simplified.
 static void simplifyBlends(VPlan &Plan) {
@@ -1778,6 +1812,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
   runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
   runPass(simplifyBlends, Plan);
   runPass(removeDeadRecipes, Plan);
+  runPass(convertToUniformRecipes, Plan);
   runPass(legalizeAndOptimizeInductions, Plan);
   runPass(removeRedundantExpandSCEVRecipes, Plan);
   runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
index 02a876a3fda67..bb96c166f894c 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -7,86 +7,87 @@ define void @test(ptr %p, i40 %a) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp sgt i1 true, false
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
 ; CHECK:       pred.store.if:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE]]
 ; CHECK:       pred.store.continue:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
 ; CHECK:       pred.store.if1:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE2]]
 ; CHECK:       pred.store.continue2:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
 ; CHECK:       pred.store.if3:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE4]]
 ; CHECK:       pred.store.continue4:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]]
 ; CHECK:       pred.store.if5:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE6]]
 ; CHECK:       pred.store.continue6:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]]
 ; CHECK:       pred.store.if7:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE8]]
 ; CHECK:       pred.store.continue8:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]]
 ; CHECK:       pred.store.if9:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE10]]
 ; CHECK:       pred.store.continue10:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12:%.*]]
 ; CHECK:       pred.store.if11:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE12]]
 ; CHECK:       pred.store.continue12:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF13:%.*]], label [[PRED_STORE_CONTINUE14:%.*]]
 ; CHECK:       pred.store.if13:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE14]]
 ; CHECK:       pred.store.continue14:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF15:%.*]], label [[PRED_STORE_CONTINUE16:%.*]]
 ; CHECK:       pred.store.if15:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE16]]
 ; CHECK:       pred.store.continue16:
 ; CHECK-NEXT:    br i1 true, label [[PRED_STORE_IF17:%.*]], label [[PRED_STORE_CONTINUE18:%.*]]
 ; CHECK:       pred.store.if17:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE18]]
 ; CHECK:       pred.store.continue18:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF19:%.*]], label [[PRED_STORE_CONTINUE20:%.*]]
 ; CHECK:       pred.store.if19:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE20]]
 ; CHECK:       pred.store.continue20:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF21:%.*]], label [[PRED_STORE_CONTINUE22:%.*]]
 ; CHECK:       pred.store.if21:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE22]]
 ; CHECK:       pred.store.continue22:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF23:%.*]], label [[PRED_STORE_CONTINUE24:%.*]]
 ; CHECK:       pred.store.if23:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE24]]
 ; CHECK:       pred.store.continue24:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF25:%.*]], label [[PRED_STORE_CONTINUE26:%.*]]
 ; CHECK:       pred.store.if25:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE26]]
 ; CHECK:       pred.store.continue26:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF27:%.*]], label [[PRED_STORE_CONTINUE28:%.*]]
 ; CHECK:       pred.store.if27:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE28]]
 ; CHECK:       pred.store.continue28:
 ; CHECK-NEXT:    br i1 false, label [[PRED_STORE_IF29:%.*]], label [[PRED_STORE_CONTINUE30:%.*]]
 ; CHECK:       pred.store.if29:
-; CHECK-NEXT:    store i1 false, ptr [[P]], align 1
+; CHECK-NEXT:    store i1 [[TMP0]], ptr [[P]], align 1
 ; CHECK-NEXT:    br label [[PRED_STORE_CONTINUE30]]
 ; CHECK:       pred.store.continue30:
 ; CHECK-NEXT:    br label [[MIDDLE_BLOCK:%.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
index f8b1cc2d775f5..7c42c3d9cd52e 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
@@ -890,9 +890,7 @@ define i64 @cost_assume(ptr %end, i64 %N) {
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[N:%.*]], i64 0
-; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[TMP3:%.*]] = icmp ne <2 x i64> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT:    [[TMP11:%.*]] = icmp ne i64 [[N:%.*]], 0
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -904,7 +902,6 @@ define i64 @cost_assume(ptr %end, i64 %N) {
 ; CHECK-NEXT:    [[TMP8]] = add <2 x i64> [[VEC_PHI2]], splat (i64 1)
 ; CHECK-NEXT:    [[TMP9]] = add <2 x i64> [[VEC_PHI3]], splat (i64 1)
 ; CHECK-NEXT:    [[TMP10]] = add <2 x i64> [[VEC_PHI4]], splat (i64 1)
-; CHECK-NEXT:    [[TMP11:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP11]])
diff --git a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
index fb84739881010..30e0acb4d7bf6 100644
--- a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
+++ b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
@@ -159,9 +159,6 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
 ; CHECK-NEXT:    [[IDENT_CHECK:%.*]] = icmp ne i32 [[SCALE]], 1
 ; CHECK-NEXT:    br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT:    [[TMP81:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT:    [[TMP82:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
 ; CHECK-NEXT:    [[TMP83:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
@@ -174,10 +171,10 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
 ; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP12]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP14]]
 ; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP16]]
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP11]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP13]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP15]], align 8
-; CHECK-NEXT:    store ptr [[TMP8]], ptr [[TMP17]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP11]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP13]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP15]], align 8
+; CHECK-NEXT:    store ptr [[TMP83]], ptr [[TMP17]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; CHECK-NEXT:    br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]

lukel97 · 2025-05-09T11:21:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())))
+    TryToNarrow(VPBB);


Is there a particular reason we're using a lambda here?

As if this pass had a class of its own where its main runPass() called a TryToNarrow() method on each basic block, as in runOnBasicBlock(), independently (and conceptually in parallel).

I have some follow-up patches that may calls this on additional blocks, which was why I had the lambda originally. Inlined for now, thanks

Mel-Chen · 2025-05-14T06:59:17Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

  runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
  runPass(simplifyBlends, Plan);
  runPass(removeDeadRecipes, Plan);
+  runPass(convertToUniformRecipes, Plan);


Do we need to apply uniform analysis if VF=1? If not, could we skip it when VF=1?

Hmm, suspect so, given that in such case all replicate recipes should already be "uniform" and widen recipes are irrelevant.

Yep, added a check isScalarVFOnly() to the transform

ayalz · 2025-05-14T16:57:39Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        continue;
+
+      // Skip recipes that aren't uniform and don't have only their scalar
+      // results used. In the later case, we would introduce extra broadcasts.


Suggested change

// results used. In the later case, we would introduce extra broadcasts.

// results used. In the latter case, we would introduce extra broadcasts.

Done thanks

ayalz · 2025-05-14T17:00:13Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      auto *Def = dyn_cast<VPSingleDefRecipe>(&R);
+      if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||
+          !Def->getUnderlyingValue())
+        continue;
+


Suggested change

auto *Def = dyn_cast<VPSingleDefRecipe>(&R);

if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||

!Def->getUnderlyingValue())

continue;

Done thanks

ayalz · 2025-05-14T17:03:24Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
+      if (RepR && RepR->isUniform())
+        continue;
+


Suggested change

auto *SingleDef = cast<VPSingleDefRecipe>(&R);

or RepOrWidenR.

Done thanks

ayalz · 2025-05-14T17:07:47Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
+      if (RepR && RepR->isUniform())
+        continue;


Suggested change

auto *RepR = dyn_cast<VPReplicateRecipe>(&R);

if (RepR && RepR->isUniform())

continue;

auto *RepR = dyn_cast<VPReplicateRecipe>(&R);

if (!RepR && !isa<VPWidenRecipe>(&R))

continue;

if (RepR && RepR->isUniform())

continue;

Done thanks

ayalz · 2025-05-14T17:13:55Z

llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll

 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp sgt i1 true, false


Is this the said degradation?

Yep, trivial folding that at the moment happens in IRBuilder on VPWidenRecipe, but not on replicate recipes which clone the original instruction. Will be fixed by pending VP constant folder

ayalz · 2025-05-14T18:12:21Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())))
+    TryToNarrow(VPBB);


As if this pass had a class of its own where its main runPass() called a TryToNarrow() method on each basic block, as in runOnBasicBlock(), independently (and conceptually in parallel).

ayalz · 2025-05-14T22:35:24Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+      // Skip recipes that aren't uniform and don't have only their scalar
+      // results used. In the later case, we would introduce extra broadcasts.
+      if (!vputils::isUniformAfterVectorization(Def) ||


The term "UniformAfterVectorization" and VPReplicateRecipe's "uniform" field should be renamed. Being uniform (having same value for all lanes) is independent of being before or after vectorization. The term stands for "singleLane" or "singleScalar", which is typically associated with the first lane (as in unit-stride, i.e., clearly non-uniform, GEPs whose first lane is the only one used), and with all lanes when the value is uniform.

Agreed, this is long overdue! #140134

ayalz · 2025-05-14T22:41:39Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      // VPlan analysis.
+      auto *Def = dyn_cast<VPSingleDefRecipe>(&R);
+      if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||
+          !Def->getUnderlyingValue())


Suggested change

!Def->getUnderlyingValue())

can be asserted instead if desired - these recipes must have underlying values - in order to know what to replicate or widen.

It could be asserted for VPReplicateRecipe, but not for VPWidenRecipe which does not require a underlying value.

Hmm, if VPWidenRecipe uses the (optional) underlying value only for metadata and FMF, could it be replaced with VPInstruction?

Yep we should be able to do that soon. I'll give it a try, need to see if we rely on the encoded facts that certain recipes are widened in various places.

ayalz · 2025-05-14T22:45:26Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

  runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
  runPass(simplifyBlends, Plan);
  runPass(removeDeadRecipes, Plan);
+  runPass(convertToUniformRecipes, Plan);


Hmm, suspect so, given that in such case all replicate recipes should already be "uniform" and widen recipes are irrelevant.

…sforms

Update the naming in VPReplicateRecipe and vputils to the more accurate isSingleScalar, as the functions check for cases where only a single scalar is needed, either because it produces the same value for all lanes or has only their first lane used. Discussed in llvm#139150.

ayalz

LGTM, thanks! Couple of comments, if this lands after #140134.

ayalz · 2025-05-16T11:19:35Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

  }
 }

+static void convertToUniformRecipes(VPlan &Plan) {


Suggested change

static void convertToUniformRecipes(VPlan &Plan) {

static void convertToSingleScalarRecipes(VPlan &Plan) {

as this captures both uniformity and only-first-lane-used? Also affects title of patch.

Analogous to truncateToMinimalBitwidths() which aims to reduce each lane to fewer bits, this aims to reduce each part to fewest lanes - to one. Perhaps both should start with narrow, as used in the now inlined lambda.

Updated to narrowToSingleScalarRecipe, thanks

ayalz · 2025-05-16T12:14:01Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    return;
+
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) {


Suffice to traverse shallow?

Yes, we cannot convert to uniform recipes in replicate regions at they moment, they need hoisting out first.

Worth a comment. This also prevents narrowing recipes in nested loop regions.

added thanks

ayalz · 2025-05-16T12:15:46Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        continue;
+
+      auto *RepOrWidenR = cast<VPSingleDefRecipe>(&R);
+      // Skip recipes that aren't uniform and don't have only their scalar


Suggested change

// Skip recipes that aren't uniform and don't have only their scalar

// Skip recipes that aren't single scalar or don't have only their scalar

The and here should be accurate, it skips cases that have non-scalar uses, as this may require introducing broadcasts. This is something that will be generalized in the future.

The code has an ||?

Ah yes, was thinking about the recipes we process below, updated, thanks

ayalz · 2025-05-16T12:16:57Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+      auto *Clone =
+          new VPReplicateRecipe(RepOrWidenR->getUnderlyingInstr(),
+                                RepOrWidenR->operands(), /*IsUniform*/ true);


Suggested change

RepOrWidenR->operands(), /*IsUniform*/ true);

RepOrWidenR->operands(), /*IsSingleScalar*/ true);

Done thanks

david-arm · 2025-05-16T15:36:04Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        continue;
+
+      auto *Clone =
+          new VPReplicateRecipe(RepOrWidenR->getUnderlyingInstr(),


If this is already a VPReplicateRecipe class can we avoid the clone and simply set the IsUniform flag to true on the existing object?

It would presumably also avoiding all the work to replace all uses, remove from parent, etc.

Currently we don't allow modifying most properties of existing recipes, other than operands and (IR) flags.

Such a change should probably done separately and we could relax it for some recipes, but creating new recipes here and having dead recipes removed separately later on means we don't really have to worry about invalidating any potential analyses in the future and it I think it mirrors LLVM IR, where creating new instructions is often perferred to modifying existing instructions as it can be less error-prone IIRC.

…#140134) Update the naming in VPReplicateRecipe and vputils to the more accurate isSingleScalar, as the functions check for cases where only a single scalar is needed, either because it produces the same value for all lanes or has only their first lane used. Discussed in #139150. PR: #140134

…sforms

…alar (NFC). (#140134) Update the naming in VPReplicateRecipe and vputils to the more accurate isSingleScalar, as the functions check for cases where only a single scalar is needed, either because it produces the same value for all lanes or has only their first lane used. Discussed in llvm/llvm-project#139150. PR: llvm/llvm-project#140134

…sforms

Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (llvm/llvm-project#125365) PR: llvm/llvm-project#139150

mikaelholmen · 2025-05-21T09:23:16Z

Hi @fhahn

The following starts crashing with this patch:

opt -passes="loop-vectorize" bbi-107246.ll -o /dev/null -mtriple=aarch64

It crashes with:

opt: ../include/llvm/IR/User.h:236: void llvm::User::setOperand(unsigned int, Value *): Assertion `i < NumUserOperands && "setOperand() out of range!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=loop-vectorize bbi-107246.ll -o /dev/null -mtriple=aarch64
1.	Running pass "function(loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>)" on module "bbi-107246.ll"
2.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "d"
 #0 0x0000563e62d41c46 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x46b0c46)
 #1 0x0000563e62d3f68e llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x46ae68e)
 #2 0x0000563e62d42539 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f8fbcee4d10 __restore_rt (/lib64/libpthread.so.0+0x12d10)
 #4 0x00007f8fba88452f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f8fba857e65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f8fba857d39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f8fba87ce86 (/lib64/libc.so.6+0x46e86)
 #8 0x0000563e62ecb9de llvm::User::setOperand(unsigned int, llvm::Value*) User.cpp:0:0
 #9 0x0000563e643a3cdd scalarizeInstruction(llvm::Instruction const*, llvm::VPReplicateRecipe*, llvm::VPLane const&, llvm::VPTransformState&) VPlanRecipes.cpp:0:0
#10 0x0000563e643a392b llvm::VPReplicateRecipe::execute(llvm::VPTransformState&) (build-all/bin/opt+0x5d1292b)
#11 0x0000563e6437f3ee llvm::VPBasicBlock::executeRecipes(llvm::VPTransformState*, llvm::BasicBlock*) (build-all/bin/opt+0x5cee3ee)
#12 0x0000563e6437efe6 llvm::VPIRBasicBlock::execute(llvm::VPTransformState*) (build-all/bin/opt+0x5cedfe6)
#13 0x0000563e64384bce llvm::VPlan::execute(llvm::VPTransformState*) (build-all/bin/opt+0x5cf3bce)
#14 0x0000563e6433ea96 llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool) (build-all/bin/opt+0x5cada96)
#15 0x0000563e64355042 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (build-all/bin/opt+0x5cc4042)
#16 0x0000563e6435c0cb llvm::LoopVectorizePass::runImpl(llvm::Function&) (build-all/bin/opt+0x5ccb0cb)
#17 0x0000563e6435c986 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x5ccb986)
#18 0x0000563e6420ad0d llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#19 0x0000563e62f33587 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x48a2587)
#20 0x0000563e6420848d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#21 0x0000563e62f3813e llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x48a713e)
#22 0x0000563e642044ad llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#23 0x0000563e62f32277 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x48a1277)
#24 0x0000563e64190b8c llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x5affb8c)
#25 0x0000563e62d040fe optMain (build-all/bin/opt+0x46730fe)
#26 0x00007f8fba8707e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#27 0x0000563e62d01bee _start (build-all/bin/opt+0x4670bee)
Abort (core dumped)

bbi-107246.ll.gz

We cannot convert predicated recipes to uniform ones at the moment. This fixes a crash reported for #139150.

fhahn · 2025-05-21T21:14:46Z

Hi @fhahn

The following starts crashing with this patch:

opt -passes="loop-vectorize" bbi-107246.ll -o /dev/null -mtriple=aarch64

It crashes with:

opt: ../include/llvm/IR/User.h:236: void llvm::User::setOperand(unsigned int, Value *): Assertion `i < NumUserOperands && "setOperand() out of range!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=loop-vectorize bbi-107246.ll -o /dev/null -mtriple=aarch64
1.	Running pass "function(loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>)" on module "bbi-107246.ll"
2.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "d"
 #0 0x0000563e62d41c46 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x46b0c46)
 #1 0x0000563e62d3f68e llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x46ae68e)
 #2 0x0000563e62d42539 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f8fbcee4d10 __restore_rt (/lib64/libpthread.so.0+0x12d10)
 #4 0x00007f8fba88452f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f8fba857e65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f8fba857d39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f8fba87ce86 (/lib64/libc.so.6+0x46e86)
 #8 0x0000563e62ecb9de llvm::User::setOperand(unsigned int, llvm::Value*) User.cpp:0:0
 #9 0x0000563e643a3cdd scalarizeInstruction(llvm::Instruction const*, llvm::VPReplicateRecipe*, llvm::VPLane const&, llvm::VPTransformState&) VPlanRecipes.cpp:0:0
#10 0x0000563e643a392b llvm::VPReplicateRecipe::execute(llvm::VPTransformState&) (build-all/bin/opt+0x5d1292b)
#11 0x0000563e6437f3ee llvm::VPBasicBlock::executeRecipes(llvm::VPTransformState*, llvm::BasicBlock*) (build-all/bin/opt+0x5cee3ee)
#12 0x0000563e6437efe6 llvm::VPIRBasicBlock::execute(llvm::VPTransformState*) (build-all/bin/opt+0x5cedfe6)
#13 0x0000563e64384bce llvm::VPlan::execute(llvm::VPTransformState*) (build-all/bin/opt+0x5cf3bce)
#14 0x0000563e6433ea96 llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool) (build-all/bin/opt+0x5cada96)
#15 0x0000563e64355042 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (build-all/bin/opt+0x5cc4042)
#16 0x0000563e6435c0cb llvm::LoopVectorizePass::runImpl(llvm::Function&) (build-all/bin/opt+0x5ccb0cb)
#17 0x0000563e6435c986 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x5ccb986)
#18 0x0000563e6420ad0d llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#19 0x0000563e62f33587 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x48a2587)
#20 0x0000563e6420848d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#21 0x0000563e62f3813e llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x48a713e)
#22 0x0000563e642044ad llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#23 0x0000563e62f32277 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x48a1277)
#24 0x0000563e64190b8c llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x5affb8c)
#25 0x0000563e62d040fe optMain (build-all/bin/opt+0x46730fe)
#26 0x00007f8fba8707e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#27 0x0000563e62d01bee _start (build-all/bin/opt+0x4670bee)
Abort (core dumped)

bbi-107246.ll.gz

Thanks, should be fixed by bf15aad

We cannot convert predicated recipes to uniform ones at the moment. This fixes a crash reported for llvm/llvm-project#139150.

mikaelholmen · 2025-05-22T08:26:28Z

Thanks, should be fixed by bf15aad

Yep, thanks!

fhahn requested review from aniragil, ayalz, lukel97 and rengolin May 8, 2025 20:21

llvmbot added backend:SystemZ vectorizers llvm:transforms labels May 8, 2025

lukel97 reviewed May 9, 2025

View reviewed changes

fhahn mentioned this pull request May 13, 2025

[LV] Support strided load with a stride of -1 #128718

Open

Mel-Chen reviewed May 14, 2025

View reviewed changes

ayalz reviewed May 14, 2025

View reviewed changes

fhahn added 2 commits May 15, 2025 20:11

Merge remote-tracking branch 'origin/main' into vplan-to-uniform-tran…

df295ec

…sforms

!fixup address latest comments.

a8dda3f

fhahn mentioned this pull request May 15, 2025

[VPlan] Rename isUniform(AfterVectorization) to isSingleScalar (NFC). #140134

Merged

!fixup naming, later->latter

a7e9545

ayalz approved these changes May 16, 2025

View reviewed changes

david-arm reviewed May 16, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into vplan-to-uniform-tran…

cd4b3bf

…sforms

Merge remote-tracking branch 'origin/main' into vplan-to-uniform-tran…

c0bf19d

…sforms

fhahn changed the title ~~[VPlan] Add convertToUniformRecipe transform.~~ [VPlan] Add narrowToSingleScalarRecipe transform. May 16, 2025

fhahn added 3 commits May 16, 2025 21:06

!fixup address comments, thanks

451d82a

Merge remote-tracking branch 'origin/main' into vplan-to-uniform-tran…

fbf5f9d

…sforms

!fixup update comment, add comment re regions.

5b15a9b

fhahn force-pushed the vplan-to-uniform-transforms branch from ba2702e to 5b15a9b Compare May 17, 2025 21:54

fhahn merged commit 07c085a into llvm:main May 18, 2025
11 checks passed

fhahn deleted the vplan-to-uniform-transforms branch May 18, 2025 08:32

fhahn added a commit that referenced this pull request May 21, 2025

[VPlan] Don't try to narrow predicated VPReplicateRecipe.

bf15aad

We cannot convert predicated recipes to uniform ones at the moment. This fixes a crash reported for #139150.

	// results used. In the later case, we would introduce extra broadcasts.
	// results used. In the latter case, we would introduce extra broadcasts.

	static void convertToUniformRecipes(VPlan &Plan) {
	static void convertToSingleScalarRecipes(VPlan &Plan) {

	// Skip recipes that aren't uniform and don't have only their scalar
	// Skip recipes that aren't single scalar or don't have only their scalar

	RepOrWidenR->operands(), /IsUniform/ true);
	RepOrWidenR->operands(), /IsSingleScalar/ true);

[VPlan] Add narrowToSingleScalarRecipe transform. #139150

[VPlan] Add narrowToSingleScalarRecipe transform. #139150

Uh oh!

Conversation

fhahn commented May 8, 2025

Uh oh!

llvmbot commented May 8, 2025

Uh oh!

llvmbot commented May 8, 2025

Uh oh!

llvmbot commented May 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayalz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment