-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[VPlan] Use wide IV if scalar lanes > 0 are used with scalable vectors. #169796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: Florian Hahn (fhahn) ChangesFor scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values. Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler. Patch is 24.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169796.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 0ba8cec008555..7e21d17fc89e5 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2367,15 +2367,6 @@ void VPScalarIVStepsRecipe::execute(VPTransformState &State) {
// Compute the scalar steps and save the results in State.
Type *IntStepTy =
IntegerType::get(BaseIVTy->getContext(), BaseIVTy->getScalarSizeInBits());
- Type *VecIVTy = nullptr;
- Value *UnitStepVec = nullptr, *SplatStep = nullptr, *SplatIV = nullptr;
- if (!FirstLaneOnly && State.VF.isScalable()) {
- VecIVTy = VectorType::get(BaseIVTy, State.VF);
- UnitStepVec =
- Builder.CreateStepVector(VectorType::get(IntStepTy, State.VF));
- SplatStep = Builder.CreateVectorSplat(State.VF, Step);
- SplatIV = Builder.CreateVectorSplat(State.VF, BaseIV);
- }
unsigned StartLane = 0;
unsigned EndLane = FirstLaneOnly ? 1 : State.VF.getKnownMinValue();
@@ -2396,19 +2387,6 @@ void VPScalarIVStepsRecipe::execute(VPTransformState &State) {
StartIdx0 = Builder.CreateSExtOrTrunc(StartIdx0, IntStepTy);
}
- if (!FirstLaneOnly && State.VF.isScalable()) {
- auto *SplatStartIdx = Builder.CreateVectorSplat(State.VF, StartIdx0);
- auto *InitVec = Builder.CreateAdd(SplatStartIdx, UnitStepVec);
- if (BaseIVTy->isFloatingPointTy())
- InitVec = Builder.CreateSIToFP(InitVec, VecIVTy);
- auto *Mul = Builder.CreateBinOp(MulOp, InitVec, SplatStep);
- auto *Add = Builder.CreateBinOp(AddOp, SplatIV, Mul);
- State.set(this, Add);
- // It's useful to record the lane values too for the known minimum number
- // of elements so we do those below. This improves the code quality when
- // trying to extract the first element, for example.
- }
-
if (BaseIVTy->isFloatingPointTy())
StartIdx0 = Builder.CreateSIToFP(StartIdx0, BaseIVTy);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 4b7f90118374b..f3f422fdce933 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -785,12 +785,17 @@ static void legalizeAndOptimizeInductions(VPlan &Plan) {
WideIV->getDebugLoc(), Builder);
// Update scalar users of IV to use Step instead.
- if (!HasOnlyVectorVFs)
+ if (!HasOnlyVectorVFs) {
WideIV->replaceAllUsesWith(Steps);
- else
- WideIV->replaceUsesWithIf(Steps, [WideIV](VPUser &U, unsigned) {
- return U.usesScalars(WideIV);
- });
+ } else {
+ bool HasScalableVF = Plan.hasScalableVF();
+ WideIV->replaceUsesWithIf(Steps,
+ [WideIV, HasScalableVF](VPUser &U, unsigned) {
+ if (HasScalableVF)
+ return U.usesFirstLaneOnly(WideIV);
+ return U.usesScalars(WideIV);
+ });
+ }
}
}
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll b/llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll
index 88e035ebf3be8..131b3d1b02727 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll
@@ -15,25 +15,22 @@ define void @test_invar_gep(ptr %dst) #0 {
; CHECK-NEXT: [[TMP3:%.*]] = mul nuw i64 [[TMP2]], 4
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 100, [[TMP3]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 100, [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
+; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <vscale x 4 x i64> [[TMP5]], splat (i64 1)
+; CHECK-NEXT: [[INDUCTION:%.*]] = add nsw <vscale x 4 x i64> zeroinitializer, [[TMP4]]
+; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP3]], i64 0
+; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
-; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX]], i64 0
-; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP10:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP5]]
-; CHECK-NEXT: [[TMP4:%.*]] = mul <vscale x 4 x i64> [[TMP10]], splat (i64 1)
-; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 4 x i64> [[DOTSPLAT]], [[TMP4]]
-; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 0
-; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 1
-; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 2
-; CHECK-NEXT: [[TMP11:%.*]] = add i64 [[INDEX]], 3
+; CHECK-NEXT: [[TMP9:%.*]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[TMP16:%.*]] = mul nuw i32 [[TMP15]], 4
; CHECK-NEXT: [[TMP17:%.*]] = sub i32 [[TMP16]], 1
; CHECK-NEXT: [[TMP18:%.*]] = extractelement <vscale x 4 x i64> [[TMP9]], i32 [[TMP17]]
; CHECK-NEXT: store i64 [[TMP18]], ptr [[TMP14:%.*]], align 1
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP3]]
+; CHECK-NEXT: [[VEC_IND_NEXT]] = add nsw <vscale x 4 x i64> [[TMP9]], [[DOTSPLAT]]
; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: middle.block:
@@ -60,38 +57,26 @@ define void @test_invar_gep(ptr %dst) #0 {
; IC2: vector.ph:
; IC2-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; IC2-NEXT: [[TMP11:%.*]] = mul nuw i64 [[TMP2]], 4
+; IC2-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP11]], i64 0
+; IC2-NEXT: [[TMP21:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; IC2-NEXT: [[TMP3:%.*]] = mul i64 [[TMP11]], 2
; IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 100, [[TMP3]]
; IC2-NEXT: [[N_VEC:%.*]] = sub i64 100, [[N_MOD_VF]]
+; IC2-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
+; IC2-NEXT: [[TMP12:%.*]] = mul nsw <vscale x 4 x i64> [[TMP5]], splat (i64 1)
+; IC2-NEXT: [[INDUCTION:%.*]] = add nsw <vscale x 4 x i64> zeroinitializer, [[TMP12]]
; IC2-NEXT: br label [[VECTOR_BODY:%.*]]
; IC2: vector.body:
; IC2-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IC2-NEXT: [[BROADCAST_SPLAT:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
-; IC2-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX]], i64 0
-; IC2-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
-; IC2-NEXT: [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP11]], i64 0
-; IC2-NEXT: [[VEC_IND:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
-; IC2-NEXT: [[TMP5:%.*]] = add <vscale x 4 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
-; IC2-NEXT: [[TMP21:%.*]] = mul <vscale x 4 x i64> [[TMP5]], splat (i64 1)
+; IC2-NEXT: [[DOTSPLAT:%.*]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
; IC2-NEXT: [[TMP22:%.*]] = add <vscale x 4 x i64> [[DOTSPLAT]], [[TMP21]]
-; IC2-NEXT: [[TMP23:%.*]] = add i64 [[TMP11]], 0
-; IC2-NEXT: [[TMP24:%.*]] = mul i64 [[TMP23]], 1
-; IC2-NEXT: [[TMP25:%.*]] = add i64 [[INDEX]], [[TMP24]]
-; IC2-NEXT: [[TMP12:%.*]] = add i64 [[TMP11]], 1
-; IC2-NEXT: [[TMP13:%.*]] = mul i64 [[TMP12]], 1
-; IC2-NEXT: [[TMP14:%.*]] = add i64 [[INDEX]], [[TMP13]]
-; IC2-NEXT: [[TMP15:%.*]] = add i64 [[TMP11]], 2
-; IC2-NEXT: [[TMP16:%.*]] = mul i64 [[TMP15]], 1
-; IC2-NEXT: [[TMP17:%.*]] = add i64 [[INDEX]], [[TMP16]]
-; IC2-NEXT: [[TMP18:%.*]] = add i64 [[TMP11]], 3
-; IC2-NEXT: [[TMP19:%.*]] = mul i64 [[TMP18]], 1
-; IC2-NEXT: [[TMP20:%.*]] = add i64 [[INDEX]], [[TMP19]]
; IC2-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32()
; IC2-NEXT: [[TMP7:%.*]] = mul nuw i32 [[TMP6]], 4
; IC2-NEXT: [[TMP8:%.*]] = sub i32 [[TMP7]], 1
; IC2-NEXT: [[TMP9:%.*]] = extractelement <vscale x 4 x i64> [[TMP22]], i32 [[TMP8]]
; IC2-NEXT: store i64 [[TMP9]], ptr [[DST:%.*]], align 1
; IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP3]]
+; IC2-NEXT: [[VEC_IND_NEXT]] = add nsw <vscale x 4 x i64> [[TMP22]], [[TMP21]]
; IC2-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; IC2-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; IC2: middle.block:
@@ -139,26 +124,24 @@ define void @test_invar_gep_var_start(i64 %start, ptr %dst) #0 {
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP4]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[START]], [[N_VEC]]
+; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
+; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[START]], i64 0
+; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP13:%.*]] = mul nsw <vscale x 4 x i64> [[TMP6]], splat (i64 1)
+; CHECK-NEXT: [[INDUCTION:%.*]] = add nsw <vscale x 4 x i64> [[DOTSPLAT]], [[TMP13]]
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP4]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i64 [[START]], [[INDEX]]
-; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
-; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[OFFSET_IDX]], i64 0
-; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP14:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP6]]
-; CHECK-NEXT: [[TMP15:%.*]] = mul <vscale x 4 x i64> [[TMP14]], splat (i64 1)
-; CHECK-NEXT: [[TMP7:%.*]] = add <vscale x 4 x i64> [[DOTSPLAT]], [[TMP15]]
-; CHECK-NEXT: [[TMP16:%.*]] = add i64 [[OFFSET_IDX]], 0
-; CHECK-NEXT: [[TMP17:%.*]] = add i64 [[OFFSET_IDX]], 1
-; CHECK-NEXT: [[TMP18:%.*]] = add i64 [[OFFSET_IDX]], 2
-; CHECK-NEXT: [[TMP13:%.*]] = add i64 [[OFFSET_IDX]], 3
+; CHECK-NEXT: [[TMP7:%.*]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[TMP9:%.*]] = mul nuw i32 [[TMP8]], 4
; CHECK-NEXT: [[TMP10:%.*]] = sub i32 [[TMP9]], 1
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <vscale x 4 x i64> [[TMP7]], i32 [[TMP10]]
; CHECK-NEXT: store i64 [[TMP11]], ptr [[DST:%.*]], align 1
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP4]]
+; CHECK-NEXT: [[VEC_IND_NEXT]] = add nsw <vscale x 4 x i64> [[TMP7]], [[BROADCAST_SPLAT2]]
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: middle.block:
@@ -187,40 +170,29 @@ define void @test_invar_gep_var_start(i64 %start, ptr %dst) #0 {
; IC2: vector.ph:
; IC2-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
; IC2-NEXT: [[TMP4:%.*]] = mul nuw i64 [[TMP3]], 4
+; IC2-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP4]], i64 0
+; IC2-NEXT: [[TMP9:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; IC2-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 2
; IC2-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP5]]
; IC2-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
; IC2-NEXT: [[TMP6:%.*]] = add i64 [[START]], [[N_VEC]]
+; IC2-NEXT: [[BROADCAST_SPLAT:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
+; IC2-NEXT: [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[START]], i64 0
+; IC2-NEXT: [[VEC_IND:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
+; IC2-NEXT: [[TMP8:%.*]] = mul nsw <vscale x 4 x i64> [[BROADCAST_SPLAT]], splat (i64 1)
+; IC2-NEXT: [[INDUCTION:%.*]] = add nsw <vscale x 4 x i64> [[VEC_IND]], [[TMP8]]
; IC2-NEXT: br label [[VECTOR_BODY:%.*]]
; IC2: vector.body:
; IC2-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IC2-NEXT: [[OFFSET_IDX:%.*]] = add i64 [[START]], [[INDEX]]
-; IC2-NEXT: [[BROADCAST_SPLAT:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
-; IC2-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[OFFSET_IDX]], i64 0
-; IC2-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
-; IC2-NEXT: [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP4]], i64 0
-; IC2-NEXT: [[VEC_IND:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
-; IC2-NEXT: [[TMP11:%.*]] = add <vscale x 4 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
-; IC2-NEXT: [[TMP9:%.*]] = mul <vscale x 4 x i64> [[TMP11]], splat (i64 1)
+; IC2-NEXT: [[DOTSPLAT:%.*]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
; IC2-NEXT: [[TMP10:%.*]] = add <vscale x 4 x i64> [[DOTSPLAT]], [[TMP9]]
-; IC2-NEXT: [[TMP23:%.*]] = add i64 [[TMP4]], 0
-; IC2-NEXT: [[TMP24:%.*]] = mul i64 [[TMP23]], 1
-; IC2-NEXT: [[TMP25:%.*]] = add i64 [[OFFSET_IDX]], [[TMP24]]
-; IC2-NEXT: [[TMP26:%.*]] = add i64 [[TMP4]], 1
-; IC2-NEXT: [[TMP27:%.*]] = mul i64 [[TMP26]], 1
-; IC2-NEXT: [[TMP28:%.*]] = add i64 [[OFFSET_IDX]], [[TMP27]]
-; IC2-NEXT: [[TMP17:%.*]] = add i64 [[TMP4]], 2
-; IC2-NEXT: [[TMP18:%.*]] = mul i64 [[TMP17]], 1
-; IC2-NEXT: [[TMP19:%.*]] = add i64 [[OFFSET_IDX]], [[TMP18]]
-; IC2-NEXT: [[TMP20:%.*]] = add i64 [[TMP4]], 3
-; IC2-NEXT: [[TMP21:%.*]] = mul i64 [[TMP20]], 1
-; IC2-NEXT: [[TMP22:%.*]] = add i64 [[OFFSET_IDX]], [[TMP21]]
; IC2-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32()
; IC2-NEXT: [[TMP13:%.*]] = mul nuw i32 [[TMP12]], 4
; IC2-NEXT: [[TMP14:%.*]] = sub i32 [[TMP13]], 1
; IC2-NEXT: [[TMP15:%.*]] = extractelement <vscale x 4 x i64> [[TMP10]], i32 [[TMP14]]
; IC2-NEXT: store i64 [[TMP15]], ptr [[DST:%.*]], align 1
; IC2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
+; IC2-NEXT: [[VEC_IND_NEXT]] = add nsw <vscale x 4 x i64> [[TMP10]], [[TMP9]]
; IC2-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; IC2-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; IC2: middle.block:
@@ -269,36 +241,34 @@ define void @test_invar_gep_var_start_step_2(i64 %start, ptr %dst) #0 {
; CHECK-NEXT: [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 4
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], [[TMP6]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
-; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[N_VEC]], 2
-; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[START]], [[TMP7]]
-; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
-; CHECK: vector.body:
-; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[INDEX]], 2
+; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[N_VEC]], 2
; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i64 [[START]], [[TMP9]]
; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
-; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[OFFSET_IDX]], i64 0
+; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[START]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP10]]
-; CHECK-NEXT: [[TMP18:%.*]] = mul <vscale x 4 x i64> [[TMP11]], splat (i64 2)
-; CHECK-NEXT: [[TMP12:%.*]] = add <vscale x 4 x i64> [[DOTSPLAT]], [[TMP18]]
-; CHECK-NEXT: [[TMP19:%.*]] = add i64 [[OFFSET_IDX]], 0
-; CHECK-NEXT: [[TMP20:%.*]] = add i64 [[OFFSET_IDX]], 2
-; CHECK-NEXT: [[TMP21:%.*]] = add i64 [[OFFSET_IDX]], 4
-; CHECK-NEXT: [[TMP22:%.*]] = add i64 [[OFFSET_IDX]], 6
+; CHECK-NEXT: [[TMP18:%.*]] = mul nsw <vscale x 4 x i64> [[TMP10]], splat (i64 2)
+; CHECK-NEXT: [[INDUCTION:%.*]] = add nsw <vscale x 4 x i64> [[DOTSPLAT]], [[TMP18]]
+; CHECK-NEXT: [[TMP11:%.*]] = mul nsw i64 2, [[TMP6]]
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP11]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
+; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
+; CHECK: vector.body:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT: [[TMP12:%.*]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[TMP14:%.*]] = mul nuw i32 [[TMP13]], 4
; CHECK-NEXT: [[TMP15:%.*]] = sub i32 [[TMP14]], 1
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <vscale x 4 x i64> [[TMP12]], i32 [[TMP15]]
; CHECK-NEXT: store i64 [[TMP16]], ptr [[DST:%.*]], align 1
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
+; CHECK-NEXT: [[VEC_IND_NEXT]] = add nsw <vscale x 4 x i64> [[TMP12]], [[BROADCAST_SPLAT2]]
; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
-; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP8]], [[MIDDLE_BLOCK]] ], [ [[START]], [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[OFFSET_IDX]], [[MIDDLE_BLOCK]] ], [ [[START]], [[ENTRY:%.*]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
@@ -322,49 +292,38 @@ define void @test_invar_gep_var_start_step_2(i64 %start, ptr %dst) #0 {
; IC2: vector.ph:
; IC2-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; IC2-NEXT: [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 4
+; IC2-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP6]], i64 0
+; IC2-NEXT: [[BROADCAST_SPLAT1:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i64> poison,...
[truncated]
|
Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the patch adds a new lane operand to VPScalarIVStepsRecipe, which is only added when replicating. This enables removing a number of lane 0 computations. The lane operand will also be used to explicitly replicate replicate regions in a follow-up. Depends on llvm#169796 (included in PR).
0ec3c3b to
6b2f03e
Compare
| ; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 100, [[TMP3]] | ||
| ; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 100, [[N_MOD_VF]] | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64() | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <vscale x 4 x i64> [[TMP5]], splat (i64 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation, but looks like we're missing a simple fold here - this should be a nop. I might take a look and see what's going on ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the loops in this test will get folded away by LICM. Were we seeing something in the wild where we needed to generate the last lane of a widened induction variable with a scalable VF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are cases where we currently may not be able to sink I think. The tests cover (the only?) code path where we extract the last lane of an IV in the loop for scalable vectors.
| bool HasScalableVF = Plan.hasScalableVF(); | ||
| WideIV->replaceUsesWithIf(Steps, | ||
| [WideIV, HasScalableVF](VPUser &U, unsigned) { | ||
| if (HasScalableVF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are plans containing scalable VFs always guaranteed to never include a fixed-width VF? I think that's probably true in practice, but this might give strange behaviour if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently we build the plans for scalable/fixed vectorr completely separately, so they should never mix.
|
|
||
| // Update scalar users of IV to use Step instead. | ||
| if (!HasOnlyVectorVFs) | ||
| if (!HasOnlyVectorVFs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding an assert here that the plan doesn't have a scalable VF? i.e.
if (!HasOnlyVectorVFs) {
assert(!Plan.hasScalableVF() && ...);
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added thanks. If the plan contains the scalar VF, it should never include scalable VFs.
For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values. Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler.
6b2f03e to
87cbef7
Compare
david-arm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| ; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 100, [[TMP3]] | ||
| ; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 100, [[N_MOD_VF]] | ||
| ; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64() | ||
| ; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <vscale x 4 x i64> [[TMP5]], splat (i64 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the loops in this test will get folded away by LICM. Were we seeing something in the wild where we needed to generate the last lane of a widened induction variable with a scalable VF?
…able vectors. (#169796) For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values. Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler. PR: llvm/llvm-project#169796
Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the patch adds a new lane operand to VPScalarIVStepsRecipe, which is only added when replicating. This enables removing a number of lane 0 computations. The lane operand will also be used to explicitly replicate replicate regions in a follow-up. Depends on llvm#169796 (included in PR).
…s. (llvm#169796) For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values. Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler. PR: llvm#169796
…s. (llvm#169796) For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values. Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler. PR: llvm#169796
Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the patch adds a new lane operand to VPScalarIVStepsRecipe, which is only added when replicating. This enables removing a number of lane 0 computations. The lane operand will also be used to explicitly replicate replicate regions in a follow-up. Depends on llvm#169796 (included in PR).
Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the patch adds a new lane operand to VPScalarIVStepsRecipe, which is only added when replicating. This enables removing a number of lane 0 computations. The lane operand will also be used to explicitly replicate replicate regions in a follow-up. Depends on llvm#169796 (included in PR).
Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the patch adds a new lane operand to VPScalarIVStepsRecipe, which is only added when replicating. This enables removing a number of lane 0 computations. The lane operand will also be used to explicitly replicate replicate regions in a follow-up. Depends on llvm#169796 (included in PR).
Extend replicateByVF to also handle VPScalarIVStepsRecipe. To do so, the patch adds a new lane operand to VPScalarIVStepsRecipe, which is only added when replicating. This enables removing a number of lane 0 computations. The lane operand will also be used to explicitly replicate replicate regions in a follow-up. Depends on llvm#169796 (included in PR).
…s. (llvm#169796) For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values. Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler. PR: llvm#169796
For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values.
Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler.