-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[VPlan] Be more careful with CSE in replicate regions. #162110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Recipes in replicate regions implicitly depend on the region's predicate. Limit CSE to recipes in the same block, when either recipe is in a replicate region. This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE on recipes inside a replicate region, we may end up with 2 VPPredInstPHIRecipes sharing the same operand. This is incompatible with current VPPredInstPHIRecipe codegen, which re-sets the current value of its operand in VPTransformState. This can cause crashes in the added test cases. Note that this patch only modifies ::isEqual to check for replicating regions and not getHash, as CSE across replicating regions should be uncommon. Fixes llvm#157314. Fixes llvm#161974.
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesRecipes in replicate regions implicitly depend on the region's predicate. Limit CSE to recipes in the same block, when either recipe is in a replicate region. This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE on recipes inside a replicate region, we may end up with 2 VPPredInstPHIRecipes sharing the same operand. This is incompatible with current VPPredInstPHIRecipe codegen, which re-sets the current value of its operand in VPTransformState. This can cause crashes in the added test cases. Note that this patch only modifies ::isEqual to check for replicating regions and not getHash, as CSE across replicating regions should be uncommon. Full diff: https://github.com/llvm/llvm-project/pull/162110.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index ebf833e2b7e88..f7dbecabff1f7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1979,6 +1979,12 @@ struct VPCSEDenseMapInfo : public DenseMapInfo<VPSingleDefRecipe *> {
.Case<VPInstruction, VPWidenRecipe, VPWidenCastRecipe,
VPWidenSelectRecipe, VPWidenGEPRecipe, VPReplicateRecipe>(
[](auto *I) { return std::make_pair(false, I->getOpcode()); })
+ .Case<VPPredInstPHIRecipe>([](auto *I) {
+ // Treat VPPredInstPHIRecipe as Instruction::PHI for CSE. This is only
+ // safe, if they are in the same block and hence share the same
+ // predicate.
+ return std::make_pair(false, Instruction::PHI);
+ })
.Case<VPWidenIntrinsicRecipe>([](auto *I) {
return std::make_pair(true, I->getVectorIntrinsicID());
})
@@ -2053,6 +2059,15 @@ struct VPCSEDenseMapInfo : public DenseMapInfo<VPSingleDefRecipe *> {
LFlags->getPredicate() !=
cast<VPRecipeWithIRFlags>(R)->getPredicate())
return false;
+ // Recipes in replicate regions implicitly depend on predicate. If either
+ // recipe is in a replicate region, only consider them equal if both have
+ // the same parent.
+ const VPRegionBlock *RegionL = L->getParent()->getParent();
+ const VPRegionBlock *RegionR = R->getParent()->getParent();
+ if (((RegionL && RegionL->isReplicator()) ||
+ (RegionR && RegionR->isReplicator())) &&
+ L->getParent() != R->getParent())
+ return false;
const VPlan *Plan = L->getParent()->getPlan();
VPTypeAnalysis TypeInfo(*Plan);
return TypeInfo.inferScalarType(L) == TypeInfo.inferScalarType(R);
diff --git a/llvm/test/Transforms/LoopVectorize/cse-replicate-regions.ll b/llvm/test/Transforms/LoopVectorize/cse-replicate-regions.ll
new file mode 100644
index 0000000000000..c0692f3231e89
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/cse-replicate-regions.ll
@@ -0,0 +1,163 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6
+; RUN: opt -p loop-vectorize -force-vector-width=2 -force-widen-divrem-via-safe-divisor=false -S %s | FileCheck %s
+
+define void @multiple_vppredinstphi_with_same_predicate(ptr %A, i32 %d) {
+; CHECK-LABEL: define void @multiple_vppredinstphi_with_same_predicate(
+; CHECK-SAME: ptr [[A:%.*]], i32 [[D:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_SDIV_CONTINUE2:.*]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[INDEX]]
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP0]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0
+; CHECK-NEXT: br i1 [[TMP2]], label %[[PRED_SDIV_IF:.*]], label %[[PRED_SDIV_CONTINUE:.*]]
+; CHECK: [[PRED_SDIV_IF]]:
+; CHECK-NEXT: [[TMP3:%.*]] = sdiv i32 -10, [[D]]
+; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i32 0
+; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE]]
+; CHECK: [[PRED_SDIV_CONTINUE]]:
+; CHECK-NEXT: [[TMP5:%.*]] = phi <2 x i32> [ poison, %[[VECTOR_BODY]] ], [ [[TMP4]], %[[PRED_SDIV_IF]] ]
+; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
+; CHECK-NEXT: br i1 [[TMP6]], label %[[PRED_SDIV_IF1:.*]], label %[[PRED_SDIV_CONTINUE2]]
+; CHECK: [[PRED_SDIV_IF1]]:
+; CHECK-NEXT: [[TMP7:%.*]] = sdiv i32 -10, [[D]]
+; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP7]], i32 1
+; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE2]]
+; CHECK: [[PRED_SDIV_CONTINUE2]]:
+; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i32> [ [[TMP5]], %[[PRED_SDIV_CONTINUE]] ], [ [[TMP8]], %[[PRED_SDIV_IF1]] ]
+; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP9]], [[TMP9]]
+; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> [[TMP10]], <2 x i32> zeroinitializer
+; CHECK-NEXT: store <2 x i32> [[PREDPHI]], ptr [[TMP0]], align 4
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
+; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 100
+; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %loop.header
+
+loop.header:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
+ %gep.A = getelementptr inbounds i32, ptr %A, i32 %iv
+ %l = load i32, ptr %gep.A
+ %c = icmp sgt i32 %l, 0
+ br i1 %c, label %then, label %loop.latch
+
+then:
+ %div.0 = sdiv i32 -10, %d
+ %div.1 = sdiv i32 -10, %d
+ %add = add i32 %div.1, %div.0
+ br label %loop.latch
+
+loop.latch:
+ %merge = phi i32 [ %add, %then ], [ 0, %loop.header ]
+ store i32 %merge, ptr %gep.A
+ %iv.next = add i32 %iv, 1
+ %ec = icmp eq i32 %iv.next, 100
+ br i1 %ec, label %exit, label %loop.header
+
+exit:
+ ret void
+}
+
+define void @multiple_vppredinstphi_with_different_predicate(ptr %A, i32 %d) {
+; CHECK-LABEL: define void @multiple_vppredinstphi_with_different_predicate(
+; CHECK-SAME: ptr [[A:%.*]], i32 [[D:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_SDIV_CONTINUE6:.*]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[INDEX]]
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP0]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0
+; CHECK-NEXT: br i1 [[TMP2]], label %[[PRED_SDIV_IF:.*]], label %[[PRED_SDIV_CONTINUE:.*]]
+; CHECK: [[PRED_SDIV_IF]]:
+; CHECK-NEXT: [[TMP3:%.*]] = sdiv i32 -10, [[D]]
+; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i32 0
+; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE]]
+; CHECK: [[PRED_SDIV_CONTINUE]]:
+; CHECK-NEXT: [[TMP5:%.*]] = phi <2 x i32> [ poison, %[[VECTOR_BODY]] ], [ [[TMP4]], %[[PRED_SDIV_IF]] ]
+; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
+; CHECK-NEXT: br i1 [[TMP6]], label %[[PRED_SDIV_IF1:.*]], label %[[PRED_SDIV_CONTINUE2:.*]]
+; CHECK: [[PRED_SDIV_IF1]]:
+; CHECK-NEXT: [[TMP7:%.*]] = sdiv i32 -10, [[D]]
+; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP7]], i32 1
+; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE2]]
+; CHECK: [[PRED_SDIV_CONTINUE2]]:
+; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i32> [ [[TMP5]], %[[PRED_SDIV_CONTINUE]] ], [ [[TMP8]], %[[PRED_SDIV_IF1]] ]
+; CHECK-NEXT: [[TMP10:%.*]] = xor <2 x i1> [[TMP1]], splat (i1 true)
+; CHECK-NEXT: [[TMP11:%.*]] = or <2 x i1> [[TMP1]], [[TMP10]]
+; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> [[TMP9]], <2 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], splat (i32 20)
+; CHECK-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP11]], <2 x i1> [[TMP12]], <2 x i1> zeroinitializer
+; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i1> [[TMP13]], i32 0
+; CHECK-NEXT: br i1 [[TMP14]], label %[[PRED_SDIV_IF3:.*]], label %[[PRED_SDIV_CONTINUE4:.*]]
+; CHECK: [[PRED_SDIV_IF3]]:
+; CHECK-NEXT: [[TMP15:%.*]] = sdiv i32 -10, [[D]]
+; CHECK-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i32 0
+; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE4]]
+; CHECK: [[PRED_SDIV_CONTINUE4]]:
+; CHECK-NEXT: [[TMP17:%.*]] = phi <2 x i32> [ poison, %[[PRED_SDIV_CONTINUE2]] ], [ [[TMP16]], %[[PRED_SDIV_IF3]] ]
+; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x i1> [[TMP13]], i32 1
+; CHECK-NEXT: br i1 [[TMP18]], label %[[PRED_SDIV_IF5:.*]], label %[[PRED_SDIV_CONTINUE6]]
+; CHECK: [[PRED_SDIV_IF5]]:
+; CHECK-NEXT: [[TMP19:%.*]] = sdiv i32 -10, [[D]]
+; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP19]], i32 1
+; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE6]]
+; CHECK: [[PRED_SDIV_CONTINUE6]]:
+; CHECK-NEXT: [[TMP21:%.*]] = phi <2 x i32> [ [[TMP17]], %[[PRED_SDIV_CONTINUE4]] ], [ [[TMP20]], %[[PRED_SDIV_IF5]] ]
+; CHECK-NEXT: [[PREDPHI7:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP21]], <2 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP22:%.*]] = add <2 x i32> [[PREDPHI]], [[PREDPHI7]]
+; CHECK-NEXT: store <2 x i32> [[TMP22]], ptr [[TMP0]], align 4
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
+; CHECK-NEXT: [[TMP23:%.*]] = icmp eq i32 [[INDEX_NEXT]], 100
+; CHECK-NEXT: br i1 [[TMP23]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %loop.header
+
+loop.header:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
+ %gep.A = getelementptr inbounds i32, ptr %A, i32 %iv
+ %l = load i32, ptr %gep.A
+ %c.0 = icmp sgt i32 %l, 0
+ br i1 %c.0, label %then.0, label %continue
+
+then.0:
+ %div.0 = sdiv i32 -10, %d
+ br label %continue
+
+continue:
+ %merge.0 = phi i32 [ %div.0, %then.0 ], [ 0, %loop.header ]
+ %c.1 = icmp sgt i32 %l, 20
+ br i1 %c.1, label %then.1, label %loop.latch
+
+then.1:
+ %div.1 = sdiv i32 -10, %d
+ br label %loop.latch
+
+loop.latch:
+ %merge.1 = phi i32 [ %div.1, %then.1 ], [ 0, %continue ]
+ %add = add i32 %merge.0, %merge.1
+ store i32 %add, ptr %gep.A
+ %iv.next = add i32 %iv, 1
+ %ec = icmp eq i32 %iv.next, 100
+ br i1 %ec, label %exit, label %loop.header
+
+exit:
+ ret void
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this patch only modifies ::isEqual to check for replicating regions and not getHash, as CSE across replicating regions should be uncommon.
Is there some issue with hashing the parent region in the case of replicates? Would be safer to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we perform CSE on recipes inside a replicate region, we may end up with 2 VPPredInstPHIRecipes sharing the same operand.
Hm, this reminds me of our fix to the const-folder. I guess the cse and const-folder are more similar than I thought (also because I have a proposed patch to share getOpcode).
// provided we account for the data embedded in them while checking for | ||
// equality or hashing. We assign VPVectorEndPointerRecipe the GEP opcode, | ||
// as it is essentially a GEP with different semantics. | ||
auto C = isa<VPVectorPointerRecipe>(Def) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to handle PredPHI here, as I think getOpcodeOrIntrinsicID can be shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's going to work, as we need to return a proper opcode from getOpcodeOrIntrinsicID
. Currently we return std::nullopt
for VPVectorPointerRecipe, which works for now because there is only a single recipe we return that for, but needs to be fixed to support more recipes that don't map directly to llvm::Instruction opcodes: #162267
vputils::isSingleScalar(Def), hash_combine_range(Def->operands())); | ||
if (auto *RFlags = dyn_cast<VPRecipeWithIRFlags>(Def)) | ||
if (RFlags->hasPredicate()) | ||
return hash_combine(Result, RFlags->getPredicate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result = hash_combine(Result, RFlags->getPredicate()); | |
const VPRegionBlock *RepR = Def->getParent()->getParent(); | |
if (RepR && RepR->isReplicator()) | |
Result = hash_combine(Result, Def->getParent()); |
We can perform CSE on recipes that do not directly map to Instruction opcodes. One example is VPVectorPointerRecipe. Currently this is handled by supporting them in ::canHandle, but currently that means that we return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently only works, because the only case we return std::nullopt and perform CSE is VPVectorPointerRecipe. But that does not work if we support more such recipes, like VPPredInstPHIRecipe (llvm#162110). To fix this, return a custom opcode from getOpcodeOrIntrinsicID for recipes like VPVectorPointerRecipe, using the VPDefID after all regular instruction opcodes.
We can perform CSE on recipes that do not directly map to Instruction opcodes. One example is VPVectorPointerRecipe. Currently this is handled by supporting them in ::canHandle, but currently that means that we return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently only works, because the only case we return std::nullopt and perform CSE is VPVectorPointerRecipe. But that does not work if we support more such recipes, like VPPredInstPHIRecipe (llvm#162110). To fix this, return a custom opcode from getOpcodeOrIntrinsicID for recipes like VPVectorPointerRecipe, using the VPDefID after all regular instruction opcodes.
We can perform CSE on recipes that do not directly map to Instruction opcodes. One example is VPVectorPointerRecipe. Currently this is handled by supporting them in ::canHandle, but currently that means that we return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently only works, because the only case we return std::nullopt and perform CSE is VPVectorPointerRecipe. But that does not work if we support more such recipes, like VPPredInstPHIRecipe (llvm#162110). To fix this, return a custom opcode from getOpcodeOrIntrinsicID for recipes like VPVectorPointerRecipe, using the VPDefID after all regular instruction opcodes.
Recipes in replicate regions implicitly depend on the region's predicate. Limit CSE to recipes in the same block, when either recipe is in a replicate region.
This allows handling VPPredInstPHIRecipe during CSE. If we perform CSE on recipes inside a replicate region, we may end up with 2 VPPredInstPHIRecipes sharing the same operand. This is incompatible with current VPPredInstPHIRecipe codegen, which re-sets the current value of its operand in VPTransformState. This can cause crashes in the added test cases.
Note that this patch only modifies ::isEqual to check for replicating regions and not getHash, as CSE across replicating regions should be uncommon.
Fixes #157314.
Fixes #161974.