-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[SROA] Rewrite invariant group intrinsics after splitting alloca #107557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SROA] Rewrite invariant group intrinsics after splitting alloca #107557
Conversation
|
@llvm/pr-subscribers-llvm-transforms Author: Antonio Frighetto (antoniofrighetto) ChangesA miscompilation issue has been addressed with improved checking. Fixes: #105537. Full diff: https://github.com/llvm/llvm-project/pull/107557.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index d0186da1bc5e22..169b34181399a5 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -3522,6 +3522,10 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
"Unexpected intrinsic!");
LLVM_DEBUG(dbgs() << " original: " << II << "\n");
+ // Do not record invariant group intrinsics for deletion.
+ if (II.isLaunderOrStripInvariantGroup())
+ return true;
+
// Record this instruction for deletion.
Pass.DeadInsts.push_back(&II);
@@ -3532,9 +3536,6 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
return true;
}
- if (II.isLaunderOrStripInvariantGroup())
- return true;
-
assert(II.getArgOperand(1) == OldPtr);
// Lifetime intrinsics are only promotable if they cover the whole alloca.
// Therefore, we drop lifetime intrinsics which don't cover the whole
diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index 1b7912fdf5e304..95a4178b276177 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -81,7 +81,8 @@ bool llvm::isAllocaPromotable(const AllocaInst *AI) {
return false;
} else if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(U)) {
if (!II->isLifetimeStartOrEnd() && !II->isDroppable() &&
- II->getIntrinsicID() != Intrinsic::fake_use)
+ II->getIntrinsicID() != Intrinsic::fake_use &&
+ !II->isLaunderOrStripInvariantGroup())
return false;
} else if (const BitCastInst *BCI = dyn_cast<BitCastInst>(U)) {
if (!onlyUsedByLifetimeMarkersOrDroppableInsts(BCI))
diff --git a/llvm/test/Transforms/SROA/invariant-group.ll b/llvm/test/Transforms/SROA/invariant-group.ll
index 1be6f6e2fc32bb..08f2d08e58cee7 100644
--- a/llvm/test/Transforms/SROA/invariant-group.ll
+++ b/llvm/test/Transforms/SROA/invariant-group.ll
@@ -138,10 +138,13 @@ end:
ret void
}
+; Conservatively do not split the alloca.
define void @partial_promotion_of_alloca() {
; CHECK-LABEL: @partial_promotion_of_alloca(
; CHECK-NEXT: [[STRUCT_PTR_SROA_2:%.*]] = alloca i32, align 4
+; CHECK-NEXT: [[STRUCT_PTR:%.*]] = alloca [[T:%.*]], align 4
; CHECK-NEXT: store volatile i32 0, ptr [[STRUCT_PTR_SROA_2]], align 4
+; CHECK-NEXT: [[BARR:%.*]] = call ptr @llvm.launder.invariant.group.p0(ptr [[STRUCT_PTR]])
; CHECK-NEXT: [[STRUCT_PTR_SROA_2_0_STRUCT_PTR_SROA_2_4_LOAD_VAL:%.*]] = load volatile i32, ptr [[STRUCT_PTR_SROA_2]], align 4
; CHECK-NEXT: ret void
;
@@ -155,6 +158,19 @@ define void @partial_promotion_of_alloca() {
ret void
}
+define void @memcpy_after_laundering_alloca(ptr %ptr) {
+; CHECK-LABEL: @memcpy_after_laundering_alloca(
+; CHECK-NEXT: [[ALLOCA:%.*]] = alloca { i64, i64 }, align 8
+; CHECK-NEXT: [[LAUNDER:%.*]] = call ptr @llvm.launder.invariant.group.p0(ptr [[ALLOCA]])
+; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr [[LAUNDER]], ptr [[PTR:%.*]], i64 16, i1 false)
+; CHECK-NEXT: ret void
+;
+ %alloca = alloca { i64, i64 }, align 8
+ %launder = call ptr @llvm.launder.invariant.group.p0(ptr %alloca)
+ call void @llvm.memcpy.p0.p0.i64(ptr %launder, ptr %ptr, i64 16, i1 false)
+ ret void
+}
+
declare void @use(ptr)
!0 = !{}
|
| define void @partial_promotion_of_alloca() { | ||
| ; CHECK-LABEL: @partial_promotion_of_alloca( | ||
| ; CHECK-NEXT: [[STRUCT_PTR_SROA_2:%.*]] = alloca i32, align 4 | ||
| ; CHECK-NEXT: [[STRUCT_PTR:%.*]] = alloca [[T:%.*]], align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, I'm not entirely sure if it's fine in this specific case to have a field promoted to a scalar, and have the whole alloca still there.
llvm/lib/Transforms/Scalar/SROA.cpp
Outdated
|
|
||
| // Do not record invariant group intrinsics for deletion. | ||
| if (II.isLaunderOrStripInvariantGroup()) | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, shouldn't we be creating a new intrinsic for the split pointer instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should, I accidentally thought that reusing it would suffice.
| if (!II->isLifetimeStartOrEnd() && !II->isDroppable() && | ||
| II->getIntrinsicID() != Intrinsic::fake_use) | ||
| II->getIntrinsicID() != Intrinsic::fake_use && | ||
| !II->isLaunderOrStripInvariantGroup()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this is supposed to work. I don't think we have logic to handle these in mem2reg? Also wouldn't we have to scan their uses here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong, but we don't seem to have logic to handle the others here, do we actually need any specific logic in mem2reg? Why should we scan their uses? The change seems to be needed as the alloca is marked as promotable, as otherwise we would assert in PromoteMemoryToRegister. I don't think we want to mark the alloca as non-promotable when visiting the intrinsic and prevent mem2reg from having the chance to simplify the alloca further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think this gets handled here:
llvm-project/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
Lines 495 to 507 in 05284a0
| // The only users of this bitcast/GEP instruction are lifetime intrinsics. | |
| // Follow the use/def chain to erase them now instead of leaving it for | |
| // dead code elimination later. | |
| for (Use &UU : llvm::make_early_inc_range(I->uses())) { | |
| Instruction *Inst = cast<Instruction>(UU.getUser()); | |
| // Drop the use of I in droppable instructions. | |
| if (Inst->isDroppable()) { | |
| Inst->dropDroppableUse(UU); | |
| continue; | |
| } | |
| Inst->eraseFromParent(); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated, thanks.
3a10f2b to
05284a0
Compare
llvm/lib/Transforms/Scalar/SROA.cpp
Outdated
| New = IRB.CreateStripInvariantGroup(AdjustedPtr); | ||
|
|
||
| New->takeName(&II); | ||
| II.replaceAllUsesWith(New); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this RAUW correct? Can't the alloca be split into two parts, both of which need their own intrinsic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this already being done as part of visiting each slice in:
llvm-project/llvm/lib/Transforms/Scalar/SROA.cpp
Lines 4921 to 4924 in e03f427
| for (Slice &S : P) { | |
| Promotable &= Rewriter.visit(&S); | |
| ++NumUses; | |
| } |
Or am I mistaking it? It looks like the intrinsic has been rewritten for each part of the alloca (this is before mem2reg for the test case):
define void @partial_promotion_of_alloca() {
%struct_ptr.sroa.0 = alloca i32, align 4
%struct_ptr.sroa.2 = alloca i32, align 4
store i32 0, ptr %struct_ptr.sroa.0, align 4
store volatile i32 0, ptr %struct_ptr.sroa.2, align 4
%barr = call ptr @llvm.launder.invariant.group.p0(ptr %struct_ptr.sroa.0)
%1 = call ptr @llvm.launder.invariant.group.p0(ptr %struct_ptr.sroa.2)
%struct_ptr.sroa.2.0.struct_ptr.sroa.2.4.load_val = load volatile i32, ptr %struct_ptr.sroa.2, align 4
ret void
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case I have in mind is something like memcpy_after_laundering_alloca but where we end up with two memcpys rather than one. I expect you'll get incorrect pointers in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind showing an example that could lead to get incorrect pointers? I had in mind something as follows, where we memcpy each field separately, but I suspect this is not what you were referring to:
define void @memcpy_after_laundering_allocas_field(ptr %ptr) {
%alloca = alloca { i64, i64 }, align 8
%field1_ptr = getelementptr inbounds { i64, i64 }, ptr %alloca, i32 0, i32 0
%field2_ptr = getelementptr inbounds { i64, i64 }, ptr %alloca, i32 0, i32 1
%launder_field1 = call ptr @llvm.launder.invariant.group.p0(ptr %field1_ptr)
%launder_field2 = call ptr @llvm.launder.invariant.group.p0(ptr %field2_ptr)
call void @llvm.memcpy.p0.p0.i64(ptr %launder_field1, ptr %ptr, i64 8, i1 false)
call void @llvm.memcpy.p0.p0.i64(ptr %launder_field2, ptr %ptr, i64 8, i1 false)
ret void
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you mean the case where we launder only a field of the alloca (not sure if this makes sense though?)
define void @memcpy_after_laundering_alloca_2(ptr %ptr) {
%alloca = alloca { i64, i64 }, align 8
%field2_ptr = getelementptr inbounds { i64, i64 }, ptr %alloca, i32 0, i32 1
%launder = call ptr @llvm.launder.invariant.group.p0(ptr %field2_ptr)
call void @llvm.memcpy.p0.p0.i64(ptr %launder, ptr %ptr, i64 8, i1 false)
ret void
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define void @test(ptr %ptr) {
%alloca = alloca { [16 x i8], i64, [16 x i8] }, align 8
%launder = call ptr @llvm.launder.invariant.group.p0(ptr %alloca)
%gep = getelementptr i8, ptr %launder, i64 16
store i64 0, ptr %gep
call void @llvm.memcpy.p0.p0.i64(ptr %launder, ptr %ptr, i64 40, i1 false)
ret void
}With your patch this produces:
define void @test(ptr %ptr) {
%alloca.sroa.0 = alloca [16 x i8], align 8
%alloca.sroa.3 = alloca [16 x i8], align 8
%1 = call ptr @llvm.launder.invariant.group.p0(ptr %alloca.sroa.3)
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %alloca.sroa.0, ptr align 1 %ptr, i64 16, i1 false)
%alloca.sroa.2.0.ptr.sroa_idx = getelementptr inbounds i8, ptr %ptr, i64 16
%alloca.sroa.2.0.copyload = load i64, ptr %alloca.sroa.2.0.ptr.sroa_idx, align 1
%alloca.sroa.3.0.ptr.sroa_idx = getelementptr inbounds i8, ptr %ptr, i64 24
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %alloca.sroa.3, ptr align 1 %alloca.sroa.3.0.ptr.sroa_idx, i64 16, i1 false)
ret void
}Note that the launder.invariant intrinsic here is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this by considering whether we need to rewrite memory intrinsics as well when rewriting the invariant group one. Tried to keep everything self-contained in visitIntrinsicInst, but code was getting duplicated fast, and the memcpys were later being visited anyways. Thus, as the alloca may get split at each AllocaSliceRewriter iteration, if some of the users of the invariant group are memory intrinsics, we track the launder so as to use this one as ptr destination, instead of the new alloca.
A miscompilation issue has been addressed with improved handling. Fixes: llvm#105537.
2a70bbf to
1927a19
Compare
|
Kind ping. |
|
Ping :) |
|
Does this possibly look on track? Not sure if a different approach can be taken instead of having the two wrappers. |
|
Kind ping for possible alternative directions, would be nice to have this fixed. |
|
This might be a stupid question... but why are we trying to preserve the calls to |
A miscompilation issue has been addressed with improved handling.
Fixes: #105537.