Skip to content

Conversation

@antoniofrighetto
Copy link
Contributor

@antoniofrighetto antoniofrighetto commented Sep 6, 2024

A miscompilation issue has been addressed with improved handling.

Fixes: #105537.

@llvmbot
Copy link
Member

llvmbot commented Sep 6, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Antonio Frighetto (antoniofrighetto)

Changes

A miscompilation issue has been addressed with improved checking.

Fixes: #105537.


Full diff: https://github.com/llvm/llvm-project/pull/107557.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/SROA.cpp (+4-3)
  • (modified) llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp (+2-1)
  • (modified) llvm/test/Transforms/SROA/invariant-group.ll (+16)
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index d0186da1bc5e22..169b34181399a5 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -3522,6 +3522,10 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
            "Unexpected intrinsic!");
     LLVM_DEBUG(dbgs() << "    original: " << II << "\n");
 
+    // Do not record invariant group intrinsics for deletion.
+    if (II.isLaunderOrStripInvariantGroup())
+      return true;
+
     // Record this instruction for deletion.
     Pass.DeadInsts.push_back(&II);
 
@@ -3532,9 +3536,6 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
       return true;
     }
 
-    if (II.isLaunderOrStripInvariantGroup())
-      return true;
-
     assert(II.getArgOperand(1) == OldPtr);
     // Lifetime intrinsics are only promotable if they cover the whole alloca.
     // Therefore, we drop lifetime intrinsics which don't cover the whole
diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index 1b7912fdf5e304..95a4178b276177 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -81,7 +81,8 @@ bool llvm::isAllocaPromotable(const AllocaInst *AI) {
         return false;
     } else if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(U)) {
       if (!II->isLifetimeStartOrEnd() && !II->isDroppable() &&
-          II->getIntrinsicID() != Intrinsic::fake_use)
+          II->getIntrinsicID() != Intrinsic::fake_use &&
+          !II->isLaunderOrStripInvariantGroup())
         return false;
     } else if (const BitCastInst *BCI = dyn_cast<BitCastInst>(U)) {
       if (!onlyUsedByLifetimeMarkersOrDroppableInsts(BCI))
diff --git a/llvm/test/Transforms/SROA/invariant-group.ll b/llvm/test/Transforms/SROA/invariant-group.ll
index 1be6f6e2fc32bb..08f2d08e58cee7 100644
--- a/llvm/test/Transforms/SROA/invariant-group.ll
+++ b/llvm/test/Transforms/SROA/invariant-group.ll
@@ -138,10 +138,13 @@ end:
   ret void
 }
 
+; Conservatively do not split the alloca. 
 define void @partial_promotion_of_alloca() {
 ; CHECK-LABEL: @partial_promotion_of_alloca(
 ; CHECK-NEXT:    [[STRUCT_PTR_SROA_2:%.*]] = alloca i32, align 4
+; CHECK-NEXT:    [[STRUCT_PTR:%.*]] = alloca [[T:%.*]], align 4
 ; CHECK-NEXT:    store volatile i32 0, ptr [[STRUCT_PTR_SROA_2]], align 4
+; CHECK-NEXT:    [[BARR:%.*]] = call ptr @llvm.launder.invariant.group.p0(ptr [[STRUCT_PTR]])
 ; CHECK-NEXT:    [[STRUCT_PTR_SROA_2_0_STRUCT_PTR_SROA_2_4_LOAD_VAL:%.*]] = load volatile i32, ptr [[STRUCT_PTR_SROA_2]], align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -155,6 +158,19 @@ define void @partial_promotion_of_alloca() {
   ret void
 }
 
+define void @memcpy_after_laundering_alloca(ptr %ptr) {
+; CHECK-LABEL: @memcpy_after_laundering_alloca(
+; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca { i64, i64 }, align 8
+; CHECK-NEXT:    [[LAUNDER:%.*]] = call ptr @llvm.launder.invariant.group.p0(ptr [[ALLOCA]])
+; CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr [[LAUNDER]], ptr [[PTR:%.*]], i64 16, i1 false)
+; CHECK-NEXT:    ret void
+;
+  %alloca = alloca { i64, i64 }, align 8
+  %launder = call ptr @llvm.launder.invariant.group.p0(ptr %alloca)
+  call void @llvm.memcpy.p0.p0.i64(ptr %launder, ptr %ptr, i64 16, i1 false)
+  ret void
+}
+
 declare void @use(ptr)
 
 !0 = !{}

define void @partial_promotion_of_alloca() {
; CHECK-LABEL: @partial_promotion_of_alloca(
; CHECK-NEXT: [[STRUCT_PTR_SROA_2:%.*]] = alloca i32, align 4
; CHECK-NEXT: [[STRUCT_PTR:%.*]] = alloca [[T:%.*]], align 4
Copy link
Contributor Author

@antoniofrighetto antoniofrighetto Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, I'm not entirely sure if it's fine in this specific case to have a field promoted to a scalar, and have the whole alloca still there.


// Do not record invariant group intrinsics for deletion.
if (II.isLaunderOrStripInvariantGroup())
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, shouldn't we be creating a new intrinsic for the split pointer instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, I accidentally thought that reusing it would suffice.

if (!II->isLifetimeStartOrEnd() && !II->isDroppable() &&
II->getIntrinsicID() != Intrinsic::fake_use)
II->getIntrinsicID() != Intrinsic::fake_use &&
!II->isLaunderOrStripInvariantGroup())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is supposed to work. I don't think we have logic to handle these in mem2reg? Also wouldn't we have to scan their uses here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but we don't seem to have logic to handle the others here, do we actually need any specific logic in mem2reg? Why should we scan their uses? The change seems to be needed as the alloca is marked as promotable, as otherwise we would assert in PromoteMemoryToRegister. I don't think we want to mark the alloca as non-promotable when visiting the intrinsic and prevent mem2reg from having the chance to simplify the alloca further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think this gets handled here:

// The only users of this bitcast/GEP instruction are lifetime intrinsics.
// Follow the use/def chain to erase them now instead of leaving it for
// dead code elimination later.
for (Use &UU : llvm::make_early_inc_range(I->uses())) {
Instruction *Inst = cast<Instruction>(UU.getUser());
// Drop the use of I in droppable instructions.
if (Inst->isDroppable()) {
Inst->dropDroppableUse(UU);
continue;
}
Inst->eraseFromParent();
}
I think the handling is ultimately correct, but the comment there could use an update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated, thanks.

@antoniofrighetto antoniofrighetto force-pushed the feature/sroa-launder-invariant-misc branch from 3a10f2b to 05284a0 Compare September 24, 2024 16:03
@antoniofrighetto antoniofrighetto changed the title [SROA] Conservatively do not split the alloca if ptr is laundered [SROA] Rewrite invariant group intrinsics after splitting alloca Sep 24, 2024
New = IRB.CreateStripInvariantGroup(AdjustedPtr);

New->takeName(&II);
II.replaceAllUsesWith(New);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this RAUW correct? Can't the alloca be split into two parts, both of which need their own intrinsic?

Copy link
Contributor Author

@antoniofrighetto antoniofrighetto Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already being done as part of visiting each slice in:

for (Slice &S : P) {
Promotable &= Rewriter.visit(&S);
++NumUses;
}

Or am I mistaking it? It looks like the intrinsic has been rewritten for each part of the alloca (this is before mem2reg for the test case):

define void @partial_promotion_of_alloca() {
  %struct_ptr.sroa.0 = alloca i32, align 4
  %struct_ptr.sroa.2 = alloca i32, align 4
  store i32 0, ptr %struct_ptr.sroa.0, align 4
  store volatile i32 0, ptr %struct_ptr.sroa.2, align 4
  %barr = call ptr @llvm.launder.invariant.group.p0(ptr %struct_ptr.sroa.0)
  %1 = call ptr @llvm.launder.invariant.group.p0(ptr %struct_ptr.sroa.2)
  %struct_ptr.sroa.2.0.struct_ptr.sroa.2.4.load_val = load volatile i32, ptr %struct_ptr.sroa.2, align 4
  ret void
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I have in mind is something like memcpy_after_laundering_alloca but where we end up with two memcpys rather than one. I expect you'll get incorrect pointers in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind showing an example that could lead to get incorrect pointers? I had in mind something as follows, where we memcpy each field separately, but I suspect this is not what you were referring to:

define void @memcpy_after_laundering_allocas_field(ptr %ptr) {
  %alloca = alloca { i64, i64 }, align 8
  %field1_ptr = getelementptr inbounds { i64, i64 }, ptr %alloca, i32 0, i32 0
  %field2_ptr = getelementptr inbounds { i64, i64 }, ptr %alloca, i32 0, i32 1
  %launder_field1 = call ptr @llvm.launder.invariant.group.p0(ptr %field1_ptr)
  %launder_field2 = call ptr @llvm.launder.invariant.group.p0(ptr %field2_ptr)
  call void @llvm.memcpy.p0.p0.i64(ptr %launder_field1, ptr %ptr, i64 8, i1 false)
  call void @llvm.memcpy.p0.p0.i64(ptr %launder_field2, ptr %ptr, i64 8, i1 false)
  ret void
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean the case where we launder only a field of the alloca (not sure if this makes sense though?)

define void @memcpy_after_laundering_alloca_2(ptr %ptr) {
  %alloca = alloca { i64, i64 }, align 8
  %field2_ptr = getelementptr inbounds { i64, i64 }, ptr %alloca, i32 0, i32 1
  %launder = call ptr @llvm.launder.invariant.group.p0(ptr %field2_ptr)
  call void @llvm.memcpy.p0.p0.i64(ptr %launder, ptr %ptr, i64 8, i1 false)
  ret void
}

Copy link
Contributor

@nikic nikic Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define void @test(ptr %ptr) {
  %alloca = alloca { [16 x i8], i64, [16 x i8] }, align 8
  %launder = call ptr @llvm.launder.invariant.group.p0(ptr %alloca)
  %gep = getelementptr i8, ptr %launder, i64 16
  store i64 0, ptr %gep
  call void @llvm.memcpy.p0.p0.i64(ptr %launder, ptr %ptr, i64 40, i1 false)
  ret void
}

With your patch this produces:

define void @test(ptr %ptr) {
  %alloca.sroa.0 = alloca [16 x i8], align 8
  %alloca.sroa.3 = alloca [16 x i8], align 8
  %1 = call ptr @llvm.launder.invariant.group.p0(ptr %alloca.sroa.3)
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %alloca.sroa.0, ptr align 1 %ptr, i64 16, i1 false)
  %alloca.sroa.2.0.ptr.sroa_idx = getelementptr inbounds i8, ptr %ptr, i64 16
  %alloca.sroa.2.0.copyload = load i64, ptr %alloca.sroa.2.0.ptr.sroa_idx, align 1
  %alloca.sroa.3.0.ptr.sroa_idx = getelementptr inbounds i8, ptr %ptr, i64 24
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %alloca.sroa.3, ptr align 1 %alloca.sroa.3.0.ptr.sroa_idx, i64 16, i1 false)
  ret void
}

Note that the launder.invariant intrinsic here is unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this by considering whether we need to rewrite memory intrinsics as well when rewriting the invariant group one. Tried to keep everything self-contained in visitIntrinsicInst, but code was getting duplicated fast, and the memcpys were later being visited anyways. Thus, as the alloca may get split at each AllocaSliceRewriter iteration, if some of the users of the invariant group are memory intrinsics, we track the launder so as to use this one as ptr destination, instead of the new alloca.

@antoniofrighetto antoniofrighetto requested a review from nikic October 7, 2024 07:54
A miscompilation issue has been addressed with improved handling.

Fixes: llvm#105537.
@antoniofrighetto antoniofrighetto force-pushed the feature/sroa-launder-invariant-misc branch from 2a70bbf to 1927a19 Compare October 29, 2024 18:35
@antoniofrighetto
Copy link
Contributor Author

Kind ping.

@antoniofrighetto
Copy link
Contributor Author

Ping :)

@antoniofrighetto
Copy link
Contributor Author

Does this possibly look on track? Not sure if a different approach can be taken instead of having the two wrappers.

@antoniofrighetto
Copy link
Contributor Author

Kind ping for possible alternative directions, would be nice to have this fixed.

@efriedma-quic
Copy link
Collaborator

This might be a stupid question... but why are we trying to preserve the calls to llvm.launder.invariant.group? SROA strips all the relevant invariant.group metadata, and llvm.launder.invariant.group is only defined to have an effect if the returned pointer is used by load/store operations with invariant.group metadata. So the calls are just no-ops; why do we want to preserve them?

tommymcm added a commit to tommymcm/llvm-project that referenced this pull request Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

miscompilation with -fstrict-vtable-pointers

4 participants