-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[VPlan] Add narrowToSingleScalarRecipe transform. #139150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (llvm#125365)
|
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesAdd a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (#125365) Full diff: https://github.com/llvm/llvm-project/pull/139150.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 79ddb8bf0b09b..50552c843cd59 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1084,6 +1084,40 @@ void VPlanTransforms::simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy) {
}
}
+static void convertToUniformRecipes(VPlan &Plan) {
+ auto TryToNarrow = [](VPBasicBlock *VPBB) {
+ for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+ // Try to narrow wide and replicating recipes to uniform recipes, based on
+ // VPlan analysis.
+ auto *Def = dyn_cast<VPSingleDefRecipe>(&R);
+ if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||
+ !Def->getUnderlyingValue())
+ continue;
+
+ auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
+ if (RepR && RepR->isUniform())
+ continue;
+
+ // Skip recipes that aren't uniform and don't have only their scalar
+ // results used. In the later case, we would introduce extra broadcasts.
+ if (!vputils::isUniformAfterVectorization(Def) ||
+ any_of(Def->users(),
+ [Def](VPUser *U) { return !U->usesScalars(Def); }))
+ continue;
+
+ auto *Clone = new VPReplicateRecipe(Def->getUnderlyingInstr(),
+ Def->operands(), /*IsUniform*/ true);
+ Clone->insertBefore(Def);
+ Def->replaceAllUsesWith(Clone);
+ Def->eraseFromParent();
+ }
+ };
+
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+ vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())))
+ TryToNarrow(VPBB);
+}
+
/// Normalize and simplify VPBlendRecipes. Should be run after simplifyRecipes
/// to make sure the masks are simplified.
static void simplifyBlends(VPlan &Plan) {
@@ -1778,6 +1812,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
runPass(simplifyBlends, Plan);
runPass(removeDeadRecipes, Plan);
+ runPass(convertToUniformRecipes, Plan);
runPass(legalizeAndOptimizeInductions, Plan);
runPass(removeRedundantExpandSCEVRecipes, Plan);
runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
index 02a876a3fda67..bb96c166f894c 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -7,86 +7,87 @@ define void @test(ptr %p, i40 %a) {
; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
+; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i1 true, false
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
; CHECK: pred.store.if:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
; CHECK: pred.store.continue:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
; CHECK: pred.store.if1:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]
; CHECK: pred.store.continue2:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
; CHECK: pred.store.if3:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE4]]
; CHECK: pred.store.continue4:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]]
; CHECK: pred.store.if5:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]]
; CHECK: pred.store.continue6:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]]
; CHECK: pred.store.if7:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]]
; CHECK: pred.store.continue8:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]]
; CHECK: pred.store.if9:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]]
; CHECK: pred.store.continue10:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12:%.*]]
; CHECK: pred.store.if11:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
; CHECK: pred.store.continue12:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF13:%.*]], label [[PRED_STORE_CONTINUE14:%.*]]
; CHECK: pred.store.if13:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE14]]
; CHECK: pred.store.continue14:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF15:%.*]], label [[PRED_STORE_CONTINUE16:%.*]]
; CHECK: pred.store.if15:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE16]]
; CHECK: pred.store.continue16:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF17:%.*]], label [[PRED_STORE_CONTINUE18:%.*]]
; CHECK: pred.store.if17:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE18]]
; CHECK: pred.store.continue18:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF19:%.*]], label [[PRED_STORE_CONTINUE20:%.*]]
; CHECK: pred.store.if19:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE20]]
; CHECK: pred.store.continue20:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF21:%.*]], label [[PRED_STORE_CONTINUE22:%.*]]
; CHECK: pred.store.if21:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE22]]
; CHECK: pred.store.continue22:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF23:%.*]], label [[PRED_STORE_CONTINUE24:%.*]]
; CHECK: pred.store.if23:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE24]]
; CHECK: pred.store.continue24:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF25:%.*]], label [[PRED_STORE_CONTINUE26:%.*]]
; CHECK: pred.store.if25:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE26]]
; CHECK: pred.store.continue26:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF27:%.*]], label [[PRED_STORE_CONTINUE28:%.*]]
; CHECK: pred.store.if27:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE28]]
; CHECK: pred.store.continue28:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF29:%.*]], label [[PRED_STORE_CONTINUE30:%.*]]
; CHECK: pred.store.if29:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE30]]
; CHECK: pred.store.continue30:
; CHECK-NEXT: br label [[MIDDLE_BLOCK:%.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
index f8b1cc2d775f5..7c42c3d9cd52e 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
@@ -890,9 +890,7 @@ define i64 @cost_assume(ptr %end, i64 %N) {
; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[N:%.*]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i64> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT: [[TMP11:%.*]] = icmp ne i64 [[N:%.*]], 0
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -904,7 +902,6 @@ define i64 @cost_assume(ptr %end, i64 %N) {
; CHECK-NEXT: [[TMP8]] = add <2 x i64> [[VEC_PHI2]], splat (i64 1)
; CHECK-NEXT: [[TMP9]] = add <2 x i64> [[VEC_PHI3]], splat (i64 1)
; CHECK-NEXT: [[TMP10]] = add <2 x i64> [[VEC_PHI4]], splat (i64 1)
-; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
diff --git a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
index fb84739881010..30e0acb4d7bf6 100644
--- a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
+++ b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
@@ -159,9 +159,6 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i32 [[SCALE]], 1
; CHECK-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
-; CHECK-NEXT: [[TMP8:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT: [[TMP81:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT: [[TMP82:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
; CHECK-NEXT: [[TMP83:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
@@ -174,10 +171,10 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP14]]
; CHECK-NEXT: [[TMP17:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP16]]
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP11]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP13]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP15]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP17]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP11]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP13]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP15]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP17]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
|
|
@llvm/pr-subscribers-vectorizers Author: Florian Hahn (fhahn) ChangesAdd a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (#125365) Full diff: https://github.com/llvm/llvm-project/pull/139150.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 79ddb8bf0b09b..50552c843cd59 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1084,6 +1084,40 @@ void VPlanTransforms::simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy) {
}
}
+static void convertToUniformRecipes(VPlan &Plan) {
+ auto TryToNarrow = [](VPBasicBlock *VPBB) {
+ for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+ // Try to narrow wide and replicating recipes to uniform recipes, based on
+ // VPlan analysis.
+ auto *Def = dyn_cast<VPSingleDefRecipe>(&R);
+ if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||
+ !Def->getUnderlyingValue())
+ continue;
+
+ auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
+ if (RepR && RepR->isUniform())
+ continue;
+
+ // Skip recipes that aren't uniform and don't have only their scalar
+ // results used. In the later case, we would introduce extra broadcasts.
+ if (!vputils::isUniformAfterVectorization(Def) ||
+ any_of(Def->users(),
+ [Def](VPUser *U) { return !U->usesScalars(Def); }))
+ continue;
+
+ auto *Clone = new VPReplicateRecipe(Def->getUnderlyingInstr(),
+ Def->operands(), /*IsUniform*/ true);
+ Clone->insertBefore(Def);
+ Def->replaceAllUsesWith(Clone);
+ Def->eraseFromParent();
+ }
+ };
+
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+ vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())))
+ TryToNarrow(VPBB);
+}
+
/// Normalize and simplify VPBlendRecipes. Should be run after simplifyRecipes
/// to make sure the masks are simplified.
static void simplifyBlends(VPlan &Plan) {
@@ -1778,6 +1812,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
runPass(simplifyBlends, Plan);
runPass(removeDeadRecipes, Plan);
+ runPass(convertToUniformRecipes, Plan);
runPass(legalizeAndOptimizeInductions, Plan);
runPass(removeRedundantExpandSCEVRecipes, Plan);
runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
index 02a876a3fda67..bb96c166f894c 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -7,86 +7,87 @@ define void @test(ptr %p, i40 %a) {
; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
+; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i1 true, false
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
; CHECK: pred.store.if:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
; CHECK: pred.store.continue:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
; CHECK: pred.store.if1:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]
; CHECK: pred.store.continue2:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
; CHECK: pred.store.if3:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE4]]
; CHECK: pred.store.continue4:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]]
; CHECK: pred.store.if5:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]]
; CHECK: pred.store.continue6:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]]
; CHECK: pred.store.if7:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]]
; CHECK: pred.store.continue8:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]]
; CHECK: pred.store.if9:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]]
; CHECK: pred.store.continue10:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12:%.*]]
; CHECK: pred.store.if11:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
; CHECK: pred.store.continue12:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF13:%.*]], label [[PRED_STORE_CONTINUE14:%.*]]
; CHECK: pred.store.if13:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE14]]
; CHECK: pred.store.continue14:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF15:%.*]], label [[PRED_STORE_CONTINUE16:%.*]]
; CHECK: pred.store.if15:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE16]]
; CHECK: pred.store.continue16:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF17:%.*]], label [[PRED_STORE_CONTINUE18:%.*]]
; CHECK: pred.store.if17:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE18]]
; CHECK: pred.store.continue18:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF19:%.*]], label [[PRED_STORE_CONTINUE20:%.*]]
; CHECK: pred.store.if19:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE20]]
; CHECK: pred.store.continue20:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF21:%.*]], label [[PRED_STORE_CONTINUE22:%.*]]
; CHECK: pred.store.if21:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE22]]
; CHECK: pred.store.continue22:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF23:%.*]], label [[PRED_STORE_CONTINUE24:%.*]]
; CHECK: pred.store.if23:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE24]]
; CHECK: pred.store.continue24:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF25:%.*]], label [[PRED_STORE_CONTINUE26:%.*]]
; CHECK: pred.store.if25:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE26]]
; CHECK: pred.store.continue26:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF27:%.*]], label [[PRED_STORE_CONTINUE28:%.*]]
; CHECK: pred.store.if27:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE28]]
; CHECK: pred.store.continue28:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF29:%.*]], label [[PRED_STORE_CONTINUE30:%.*]]
; CHECK: pred.store.if29:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE30]]
; CHECK: pred.store.continue30:
; CHECK-NEXT: br label [[MIDDLE_BLOCK:%.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
index f8b1cc2d775f5..7c42c3d9cd52e 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
@@ -890,9 +890,7 @@ define i64 @cost_assume(ptr %end, i64 %N) {
; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[N:%.*]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i64> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT: [[TMP11:%.*]] = icmp ne i64 [[N:%.*]], 0
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -904,7 +902,6 @@ define i64 @cost_assume(ptr %end, i64 %N) {
; CHECK-NEXT: [[TMP8]] = add <2 x i64> [[VEC_PHI2]], splat (i64 1)
; CHECK-NEXT: [[TMP9]] = add <2 x i64> [[VEC_PHI3]], splat (i64 1)
; CHECK-NEXT: [[TMP10]] = add <2 x i64> [[VEC_PHI4]], splat (i64 1)
-; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
diff --git a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
index fb84739881010..30e0acb4d7bf6 100644
--- a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
+++ b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
@@ -159,9 +159,6 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i32 [[SCALE]], 1
; CHECK-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
-; CHECK-NEXT: [[TMP8:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT: [[TMP81:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT: [[TMP82:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
; CHECK-NEXT: [[TMP83:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
@@ -174,10 +171,10 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP14]]
; CHECK-NEXT: [[TMP17:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP16]]
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP11]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP13]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP15]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP17]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP11]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP13]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP15]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP17]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
|
|
@llvm/pr-subscribers-backend-systemz Author: Florian Hahn (fhahn) ChangesAdd a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (#125365) Full diff: https://github.com/llvm/llvm-project/pull/139150.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 79ddb8bf0b09b..50552c843cd59 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1084,6 +1084,40 @@ void VPlanTransforms::simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy) {
}
}
+static void convertToUniformRecipes(VPlan &Plan) {
+ auto TryToNarrow = [](VPBasicBlock *VPBB) {
+ for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+ // Try to narrow wide and replicating recipes to uniform recipes, based on
+ // VPlan analysis.
+ auto *Def = dyn_cast<VPSingleDefRecipe>(&R);
+ if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) ||
+ !Def->getUnderlyingValue())
+ continue;
+
+ auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
+ if (RepR && RepR->isUniform())
+ continue;
+
+ // Skip recipes that aren't uniform and don't have only their scalar
+ // results used. In the later case, we would introduce extra broadcasts.
+ if (!vputils::isUniformAfterVectorization(Def) ||
+ any_of(Def->users(),
+ [Def](VPUser *U) { return !U->usesScalars(Def); }))
+ continue;
+
+ auto *Clone = new VPReplicateRecipe(Def->getUnderlyingInstr(),
+ Def->operands(), /*IsUniform*/ true);
+ Clone->insertBefore(Def);
+ Def->replaceAllUsesWith(Clone);
+ Def->eraseFromParent();
+ }
+ };
+
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+ vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())))
+ TryToNarrow(VPBB);
+}
+
/// Normalize and simplify VPBlendRecipes. Should be run after simplifyRecipes
/// to make sure the masks are simplified.
static void simplifyBlends(VPlan &Plan) {
@@ -1778,6 +1812,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
runPass(simplifyBlends, Plan);
runPass(removeDeadRecipes, Plan);
+ runPass(convertToUniformRecipes, Plan);
runPass(legalizeAndOptimizeInductions, Plan);
runPass(removeRedundantExpandSCEVRecipes, Plan);
runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
index 02a876a3fda67..bb96c166f894c 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -7,86 +7,87 @@ define void @test(ptr %p, i40 %a) {
; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
+; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i1 true, false
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]]
; CHECK: pred.store.if:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
; CHECK: pred.store.continue:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2:%.*]]
; CHECK: pred.store.if1:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]
; CHECK: pred.store.continue2:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
; CHECK: pred.store.if3:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE4]]
; CHECK: pred.store.continue4:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]]
; CHECK: pred.store.if5:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]]
; CHECK: pred.store.continue6:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8:%.*]]
; CHECK: pred.store.if7:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]]
; CHECK: pred.store.continue8:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF9:%.*]], label [[PRED_STORE_CONTINUE10:%.*]]
; CHECK: pred.store.if9:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]]
; CHECK: pred.store.continue10:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12:%.*]]
; CHECK: pred.store.if11:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
; CHECK: pred.store.continue12:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF13:%.*]], label [[PRED_STORE_CONTINUE14:%.*]]
; CHECK: pred.store.if13:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE14]]
; CHECK: pred.store.continue14:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF15:%.*]], label [[PRED_STORE_CONTINUE16:%.*]]
; CHECK: pred.store.if15:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE16]]
; CHECK: pred.store.continue16:
; CHECK-NEXT: br i1 true, label [[PRED_STORE_IF17:%.*]], label [[PRED_STORE_CONTINUE18:%.*]]
; CHECK: pred.store.if17:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE18]]
; CHECK: pred.store.continue18:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF19:%.*]], label [[PRED_STORE_CONTINUE20:%.*]]
; CHECK: pred.store.if19:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE20]]
; CHECK: pred.store.continue20:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF21:%.*]], label [[PRED_STORE_CONTINUE22:%.*]]
; CHECK: pred.store.if21:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE22]]
; CHECK: pred.store.continue22:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF23:%.*]], label [[PRED_STORE_CONTINUE24:%.*]]
; CHECK: pred.store.if23:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE24]]
; CHECK: pred.store.continue24:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF25:%.*]], label [[PRED_STORE_CONTINUE26:%.*]]
; CHECK: pred.store.if25:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE26]]
; CHECK: pred.store.continue26:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF27:%.*]], label [[PRED_STORE_CONTINUE28:%.*]]
; CHECK: pred.store.if27:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE28]]
; CHECK: pred.store.continue28:
; CHECK-NEXT: br i1 false, label [[PRED_STORE_IF29:%.*]], label [[PRED_STORE_CONTINUE30:%.*]]
; CHECK: pred.store.if29:
-; CHECK-NEXT: store i1 false, ptr [[P]], align 1
+; CHECK-NEXT: store i1 [[TMP0]], ptr [[P]], align 1
; CHECK-NEXT: br label [[PRED_STORE_CONTINUE30]]
; CHECK: pred.store.continue30:
; CHECK-NEXT: br label [[MIDDLE_BLOCK:%.*]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
index f8b1cc2d775f5..7c42c3d9cd52e 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/cost-model.ll
@@ -890,9 +890,7 @@ define i64 @cost_assume(ptr %end, i64 %N) {
; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[N:%.*]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i64> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT: [[TMP11:%.*]] = icmp ne i64 [[N:%.*]], 0
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -904,7 +902,6 @@ define i64 @cost_assume(ptr %end, i64 %N) {
; CHECK-NEXT: [[TMP8]] = add <2 x i64> [[VEC_PHI2]], splat (i64 1)
; CHECK-NEXT: [[TMP9]] = add <2 x i64> [[VEC_PHI3]], splat (i64 1)
; CHECK-NEXT: [[TMP10]] = add <2 x i64> [[VEC_PHI4]], splat (i64 1)
-; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP11]])
diff --git a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
index fb84739881010..30e0acb4d7bf6 100644
--- a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
+++ b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
@@ -159,9 +159,6 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i32 [[SCALE]], 1
; CHECK-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
-; CHECK-NEXT: [[TMP8:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT: [[TMP81:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
-; CHECK-NEXT: [[TMP82:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
; CHECK-NEXT: [[TMP83:%.*]] = getelementptr i8, ptr [[DST]], i64 [[SCALE_2]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
@@ -174,10 +171,10 @@ define void @versioned_sext_use_in_gep(i32 %scale, ptr %dst, i64 %scale.2) {
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP14]]
; CHECK-NEXT: [[TMP17:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP16]]
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP11]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP13]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP15]], align 8
-; CHECK-NEXT: store ptr [[TMP8]], ptr [[TMP17]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP11]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP13]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP15]], align 8
+; CHECK-NEXT: store ptr [[TMP83]], ptr [[TMP17]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
|
|
|
||
| for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||
| vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) | ||
| TryToNarrow(VPBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason we're using a lambda here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As if this pass had a class of its own where its main runPass() called a TryToNarrow() method on each basic block, as in runOnBasicBlock(), independently (and conceptually in parallel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some follow-up patches that may calls this on additional blocks, which was why I had the lambda originally. Inlined for now, thanks
| runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType()); | ||
| runPass(simplifyBlends, Plan); | ||
| runPass(removeDeadRecipes, Plan); | ||
| runPass(convertToUniformRecipes, Plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to apply uniform analysis if VF=1? If not, could we skip it when VF=1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, suspect so, given that in such case all replicate recipes should already be "uniform" and widen recipes are irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, added a check isScalarVFOnly() to the transform
| continue; | ||
|
|
||
| // Skip recipes that aren't uniform and don't have only their scalar | ||
| // results used. In the later case, we would introduce extra broadcasts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // results used. In the later case, we would introduce extra broadcasts. | |
| // results used. In the latter case, we would introduce extra broadcasts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
| auto *Def = dyn_cast<VPSingleDefRecipe>(&R); | ||
| if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) || | ||
| !Def->getUnderlyingValue()) | ||
| continue; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto *Def = dyn_cast<VPSingleDefRecipe>(&R); | |
| if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) || | |
| !Def->getUnderlyingValue()) | |
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
| auto *RepR = dyn_cast<VPReplicateRecipe>(&R); | ||
| if (RepR && RepR->isUniform()) | ||
| continue; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto *SingleDef = cast<VPSingleDefRecipe>(&R); |
or RepOrWidenR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
| auto *RepR = dyn_cast<VPReplicateRecipe>(&R); | ||
| if (RepR && RepR->isUniform()) | ||
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto *RepR = dyn_cast<VPReplicateRecipe>(&R); | |
| if (RepR && RepR->isUniform()) | |
| continue; | |
| auto *RepR = dyn_cast<VPReplicateRecipe>(&R); | |
| if (!RepR && !isa<VPWidenRecipe>(&R)) | |
| continue; | |
| if (RepR && RepR->isUniform()) | |
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
| ; CHECK-NEXT: entry: | ||
| ; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] | ||
| ; CHECK: vector.ph: | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i1 true, false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the said degradation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, trivial folding that at the moment happens in IRBuilder on VPWidenRecipe, but not on replicate recipes which clone the original instruction. Will be fixed by pending VP constant folder
|
|
||
| for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||
| vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) | ||
| TryToNarrow(VPBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As if this pass had a class of its own where its main runPass() called a TryToNarrow() method on each basic block, as in runOnBasicBlock(), independently (and conceptually in parallel).
|
|
||
| // Skip recipes that aren't uniform and don't have only their scalar | ||
| // results used. In the later case, we would introduce extra broadcasts. | ||
| if (!vputils::isUniformAfterVectorization(Def) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "UniformAfterVectorization" and VPReplicateRecipe's "uniform" field should be renamed. Being uniform (having same value for all lanes) is independent of being before or after vectorization. The term stands for "singleLane" or "singleScalar", which is typically associated with the first lane (as in unit-stride, i.e., clearly non-uniform, GEPs whose first lane is the only one used), and with all lanes when the value is uniform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is long overdue! #140134
| // VPlan analysis. | ||
| auto *Def = dyn_cast<VPSingleDefRecipe>(&R); | ||
| if (!Def || !isa<VPReplicateRecipe, VPWidenRecipe>(Def) || | ||
| !Def->getUnderlyingValue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| !Def->getUnderlyingValue()) |
can be asserted instead if desired - these recipes must have underlying values - in order to know what to replicate or widen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be asserted for VPReplicateRecipe, but not for VPWidenRecipe which does not require a underlying value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if VPWidenRecipe uses the (optional) underlying value only for metadata and FMF, could it be replaced with VPInstruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep we should be able to do that soon. I'll give it a try, need to see if we rely on the encoded facts that certain recipes are widened in various places.
| runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType()); | ||
| runPass(simplifyBlends, Plan); | ||
| runPass(removeDeadRecipes, Plan); | ||
| runPass(convertToUniformRecipes, Plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, suspect so, given that in such case all replicate recipes should already be "uniform" and widen recipes are irrelevant.
Update the naming in VPReplicateRecipe and vputils to the more accurate isSingleScalar, as the functions check for cases where only a single scalar is needed, either because it produces the same value for all lanes or has only their first lane used. Discussed in llvm#139150.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Couple of comments, if this lands after #140134.
| } | ||
| } | ||
|
|
||
| static void convertToUniformRecipes(VPlan &Plan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| static void convertToUniformRecipes(VPlan &Plan) { | |
| static void convertToSingleScalarRecipes(VPlan &Plan) { |
as this captures both uniformity and only-first-lane-used? Also affects title of patch.
Analogous to truncateToMinimalBitwidths() which aims to reduce each lane to fewer bits, this aims to reduce each part to fewest lanes - to one. Perhaps both should start with narrow, as used in the now inlined lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to narrowToSingleScalarRecipe, thanks
| return; | ||
|
|
||
| for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||
| vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suffice to traverse shallow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we cannot convert to uniform recipes in replicate regions at they moment, they need hoisting out first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth a comment. This also prevents narrowing recipes in nested loop regions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added thanks
| continue; | ||
|
|
||
| auto *RepOrWidenR = cast<VPSingleDefRecipe>(&R); | ||
| // Skip recipes that aren't uniform and don't have only their scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Skip recipes that aren't uniform and don't have only their scalar | |
| // Skip recipes that aren't single scalar or don't have only their scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The and here should be accurate, it skips cases that have non-scalar uses, as this may require introducing broadcasts. This is something that will be generalized in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code has an ||?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, was thinking about the recipes we process below, updated, thanks
|
|
||
| auto *Clone = | ||
| new VPReplicateRecipe(RepOrWidenR->getUnderlyingInstr(), | ||
| RepOrWidenR->operands(), /*IsUniform*/ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| RepOrWidenR->operands(), /*IsUniform*/ true); | |
| RepOrWidenR->operands(), /*IsSingleScalar*/ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
| continue; | ||
|
|
||
| auto *Clone = | ||
| new VPReplicateRecipe(RepOrWidenR->getUnderlyingInstr(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is already a VPReplicateRecipe class can we avoid the clone and simply set the IsUniform flag to true on the existing object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would presumably also avoiding all the work to replace all uses, remove from parent, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't allow modifying most properties of existing recipes, other than operands and (IR) flags.
Such a change should probably done separately and we could relax it for some recipes, but creating new recipes here and having dead recipes removed separately later on means we don't really have to worry about invalidating any potential analyses in the future and it I think it mirrors LLVM IR, where creating new instructions is often perferred to modifying existing instructions as it can be less error-prone IIRC.
…alar (NFC). (#140134) Update the naming in VPReplicateRecipe and vputils to the more accurate isSingleScalar, as the functions check for cases where only a single scalar is needed, either because it produces the same value for all lanes or has only their first lane used. Discussed in llvm/llvm-project#139150. PR: llvm/llvm-project#140134
ba2702e to
5b15a9b
Compare
Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes. There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups. By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis. Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (llvm/llvm-project#125365) PR: llvm/llvm-project#139150
|
Hi @fhahn The following starts crashing with this patch: It crashes with: |
We cannot convert predicated recipes to uniform ones at the moment. This fixes a crash reported for #139150.
Thanks, should be fixed by bf15aad |
We cannot convert predicated recipes to uniform ones at the moment. This fixes a crash reported for llvm/llvm-project#139150.
Yep, thanks! |
Add a new convertToUniformRecipes transform which uses VPlan-based uniformity analysis to determine if wide recipes and replicate recipes can be converted to uniform recipes.
There are a few places where we ad-hoc convert recipes to uniform recipes, which this transform will eventually replace. There are a few more generalizations required to do so which I plan to do as follow-ups.
By converting the recipes to uniform recipes, we effectively materialize the information from the VPlan-based analysis.
Note that there is one regression at the moment in SystemZ/pr47665.ll due to trivial constant folding opportunities in the input IR. This will be fixed by VPlan-based constant folding (#125365)