Skip to content

Conversation

@WanderingAura
Copy link
Contributor

Adds support for hoisting writeonly calls in LICM.

This patch adds a missing optimization that allows hoisting of writeonly function calls out of loops when it is safe to do so. Previously, such calls were conservatively retained inside the loop body, and the redundant calls were only reduced through unrolling, relying on target-dependent heuristics.

Closes #143267

Testing:

  • Modified previously negative tests for hoisting writeonly calls to be instead positive
  • Added test cases for hoisting of two writeonly calls where the pointers do/do not alias
  • Added a test case for not argmemonly writeonly calls.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-transforms

Author: Jiachen (Yangyang) Wang (WanderingAura)

Changes

Adds support for hoisting writeonly calls in LICM.

This patch adds a missing optimization that allows hoisting of writeonly function calls out of loops when it is safe to do so. Previously, such calls were conservatively retained inside the loop body, and the redundant calls were only reduced through unrolling, relying on target-dependent heuristics.

Closes #143267

Testing:

  • Modified previously negative tests for hoisting writeonly calls to be instead positive
  • Added test cases for hoisting of two writeonly calls where the pointers do/do not alias
  • Added a test case for not argmemonly writeonly calls.

Full diff: https://github.com/llvm/llvm-project/pull/143799.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LICM.cpp (+37-5)
  • (modified) llvm/test/Transforms/LICM/call-hoisting.ll (+80-7)
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index 7965ed76a81b7..372a79edeb593 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -186,6 +186,9 @@ static bool isSafeToExecuteUnconditionally(
     const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo,
     OptimizationRemarkEmitter *ORE, const Instruction *CtxI,
     AssumptionCache *AC, bool AllowSpeculation);
+static bool memoryDefInvalidatedByLoop(MemorySSA *MSSA, MemoryDef *MD,
+                                       Loop *CurLoop,
+                                       SinkAndHoistLICMFlags &Flags);
 static bool pointerInvalidatedByLoop(MemorySSA *MSSA, MemoryUse *MU,
                                      Loop *CurLoop, Instruction &I,
                                      SinkAndHoistLICMFlags &Flags,
@@ -1032,8 +1035,8 @@ bool llvm::hoistRegion(DomTreeNode *N, AAResults *AA, LoopInfo *LI,
   if (VerifyMemorySSA)
     MSSAU.getMemorySSA()->verifyMemorySSA();
 
-    // Now that we've finished hoisting make sure that LI and DT are still
-    // valid.
+  // Now that we've finished hoisting make sure that LI and DT are still
+  // valid.
 #ifdef EXPENSIVE_CHECKS
   if (Changed) {
     assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
@@ -1258,6 +1261,21 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
         return true;
     }
 
+    if (Behavior.onlyWritesMemory()) {
+      // If it's the only memory access then there is nothing
+      // stopping us from hoisting it.
+      if (isOnlyMemoryAccess(CI, CurLoop, MSSAU))
+        return true;
+
+      if (Behavior.onlyAccessesArgPointees()) {
+        if (memoryDefInvalidatedByLoop(
+                MSSA, cast<MemoryDef>(MSSA->getMemoryAccess(CI)), CurLoop,
+                Flags))
+          return false;
+        return true;
+      }
+    }
+
     // FIXME: This should use mod/ref information to see if we can hoist or
     // sink the call.
 
@@ -1287,7 +1305,7 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
     auto *Source = getClobberingMemoryAccess(*MSSA, BAA, Flags, SIMD);
     // Make sure there are no clobbers inside the loop.
     if (!MSSA->isLiveOnEntryDef(Source) &&
-           CurLoop->contains(Source->getBlock()))
+        CurLoop->contains(Source->getBlock()))
       return false;
 
     // If there are interfering Uses (i.e. their defining access is in the
@@ -1300,7 +1318,7 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
         for (const auto &MA : *Accesses)
           if (const auto *MU = dyn_cast<MemoryUse>(&MA)) {
             auto *MD = getClobberingMemoryAccess(*MSSA, BAA, Flags,
-                const_cast<MemoryUse *>(MU));
+                                                 const_cast<MemoryUse *>(MU));
             if (!MSSA->isLiveOnEntryDef(MD) &&
                 CurLoop->contains(MD->getBlock()))
               return false;
@@ -1352,7 +1370,7 @@ static bool isTriviallyReplaceablePHI(const PHINode &PN, const Instruction &I) {
 
 /// Return true if the instruction is foldable in the loop.
 static bool isFoldableInLoop(const Instruction &I, const Loop *CurLoop,
-                         const TargetTransformInfo *TTI) {
+                             const TargetTransformInfo *TTI) {
   if (auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {
     InstructionCost CostI =
         TTI->getInstructionCost(&I, TargetTransformInfo::TCK_SizeAndLatency);
@@ -2354,6 +2372,20 @@ collectPromotionCandidates(MemorySSA *MSSA, AliasAnalysis *AA, Loop *L) {
   return Result;
 }
 
+// This function checks if a given MemoryDef gets clobbered by
+// any memory accesses within the loop.
+static bool memoryDefInvalidatedByLoop(MemorySSA *MSSA, MemoryDef *MD,
+                                       Loop *CurLoop,
+                                       SinkAndHoistLICMFlags &Flags) {
+  if (Flags.tooManyMemoryAccesses()) {
+    return true;
+  }
+  BatchAAResults BAA(MSSA->getAA());
+  MemoryAccess *Source = getClobberingMemoryAccess(*MSSA, BAA, Flags, MD);
+  return !MSSA->isLiveOnEntryDef(Source) &&
+         CurLoop->contains(Source->getBlock());
+}
+
 static bool pointerInvalidatedByLoop(MemorySSA *MSSA, MemoryUse *MU,
                                      Loop *CurLoop, Instruction &I,
                                      SinkAndHoistLICMFlags &Flags,
diff --git a/llvm/test/Transforms/LICM/call-hoisting.ll b/llvm/test/Transforms/LICM/call-hoisting.ll
index e6d2e42e34e81..e9b19db4861d4 100644
--- a/llvm/test/Transforms/LICM/call-hoisting.ll
+++ b/llvm/test/Transforms/LICM/call-hoisting.ll
@@ -64,10 +64,13 @@ exit:
 
 declare void @store(i32 %val, ptr %p) argmemonly writeonly nounwind
 
+; loop invariant calls to writeonly functions such as the above
+; should be hoisted
 define void @test(ptr %loc) {
 ; CHECK-LABEL: @test
-; CHECK-LABEL: loop:
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: exit:
 entry:
   br label %loop
@@ -85,8 +88,9 @@ exit:
 
 define void @test_multiexit(ptr %loc, i1 %earlycnd) {
 ; CHECK-LABEL: @test_multiexit
-; CHECK-LABEL: loop:
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: backedge:
 entry:
   br label %loop
@@ -107,6 +111,50 @@ exit2:
   ret void
 }
 
+; cannot be hoisted because the two pointers can alias one another
+define void @neg_two_pointer(ptr %loc, ptr %otherloc) {
+; CHECK-LABEL: @neg_two_pointer
+; CHECK-LABEL: entry:
+; CHECK-LABEL: loop:
+; CHECK: call void @store
+; CHECK: call void @store
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @store(i32 0, ptr %loc)
+  call void @store(i32 1, ptr %otherloc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+exit:
+  ret void
+}
+
+; hoisted due to pointers not aliasing
+define void @two_pointer_noalias(ptr noalias %loc, ptr %otherloc) {
+; CHECK-LABEL: @two_pointer_noalias
+; CHECK-LABEL: entry:
+; CHECK: call void @store
+; CHECK: call void @store
+; CHECK-LABEL: loop:
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @store(i32 0, ptr %loc)
+  call void @store(i32 1, ptr %otherloc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+exit:
+  ret void
+}
+
 define void @neg_lv_value(ptr %loc) {
 ; CHECK-LABEL: @neg_lv_value
 ; CHECK-LABEL: loop:
@@ -166,10 +214,11 @@ exit:
   ret void
 }
 
-define void @neg_ref(ptr %loc) {
-; CHECK-LABEL: @neg_ref
-; CHECK-LABEL: loop:
+define void @ref(ptr %loc) {
+; CHECK-LABEL: @ref
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: exit1:
 entry:
   br label %loop
@@ -257,8 +306,31 @@ exit:
   ret void
 }
 
-define void @neg_not_argmemonly(ptr %loc) {
-; CHECK-LABEL: @neg_not_argmemonly
+; not argmemonly calls can be hoisted if it's the only memory access in loop
+define void @not_argmemonly_only_access(ptr %loc) {
+; CHECK-LABEL: @not_argmemonly_only_access
+; CHECK-LABEL: entry:
+; CHECK: call void @not_argmemonly
+; CHECK-LABEL: loop:
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @not_argmemonly(i32 0, ptr %loc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret void
+}
+
+; not argmemonly calls cannot be hoisted if there are other memory accesses
+define void @neg_not_argmemonly_multiple_access(ptr noalias %loc, ptr noalias %loc2) {
+; CHECK-LABEL: @neg_not_argmemonly_multiple_access
+; CHECK-LABEL: entry:
 ; CHECK-LABEL: loop:
 ; CHECK: call void @not_argmemonly
 ; CHECK-LABEL: exit:
@@ -268,6 +340,7 @@ entry:
 loop:
   %iv = phi i32 [0, %entry], [%iv.next, %loop]
   call void @not_argmemonly(i32 0, ptr %loc)
+  call void @load(i32 0, ptr %loc2)
   %iv.next = add i32 %iv, 1
   %cmp = icmp slt i32 %iv, 200
   br i1 %cmp, label %loop, label %exit

@WanderingAura
Copy link
Contributor Author

Hi, I'm new to contributing to LLVM, so please let me know if there's anything I should do differently, or if any part of this patch could be improved.

@tgymnich tgymnich requested a review from nikic June 12, 2025 09:37
@nikic nikic changed the title [llvm] Hoisting writeonly calls [LICM] Hoisting writeonly calls Jun 12, 2025
@WanderingAura WanderingAura force-pushed the wanderingaura/hoist_writeonly_calls branch 3 times, most recently from d87eb45 to 83b0d30 Compare June 15, 2025 16:57
@WanderingAura
Copy link
Contributor Author

It looks like a writeonly function is getting hoisting in CodeGen/AMDGPU/loop_exit_with_xor.ll and I'm not sure if it's incorrect behaviour.

; Where the mask of lanes wanting to exit the loop on this iteration is
; obviously already masked by exec (a V_CMP), then lower control flow can omit
; the S_AND_B64 to avoid an unnecessary instruction.
define void @doesnt_need_and(i32 %arg) {
; GCN-LABEL: doesnt_need_and:
; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s6, 0
; GCN-NEXT: s_mov_b64 s[4:5], 0
; GCN-NEXT: .LBB1_1: ; %loop
; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: s_add_i32 s6, s6, 1
; GCN-NEXT: v_cmp_le_u32_e32 vcc, s6, v0
; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
; GCN-NEXT: buffer_store_dword v0, off, s[4:7], s4
; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]
; GCN-NEXT: s_cbranch_execnz .LBB1_1
; GCN-NEXT: ; %bb.2: ; %loopexit
; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]
entry:
br label %loop
loop:
%tmp23phi = phi i32 [ %tmp23, %loop ], [ 0, %entry ]
%tmp23 = add nuw i32 %tmp23phi, 1
%tmp27 = icmp ult i32 %arg, %tmp23
call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float poison, ptr addrspace(8) poison, i32 0, i32 poison, i32 0)
br i1 %tmp27, label %loop, label %loopexit
loopexit:
ret void
}

The call @llvm.amdgcn.raw.ptr.buffer.store.f32(float poison, ptr addrspace(8) poison, i32 0, i32 poison, i32 0) seems to be getting hoisted. Does anyone know if this is legal behaviour?

Below is the failed CI logs:

2025-06-16T08:20:15.7502444Z Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/loop_exit_with_xor.ll
2025-06-16T08:20:15.7503881Z 
2025-06-16T08:20:15.7504165Z -dump-input=help explains the following input dump.
2025-06-16T08:20:15.7504657Z 
2025-06-16T08:20:15.7504821Z Input was:
2025-06-16T08:20:15.7505247Z <<<<<<
2025-06-16T08:20:15.7505624Z          .
2025-06-16T08:20:15.7506041Z          .
2025-06-16T08:20:15.7506423Z          .
2025-06-16T08:20:15.7506833Z         57:  .type doesnt_need_and,@function 
2025-06-16T08:20:15.7507493Z         58: doesnt_need_and: ; @doesnt_need_and 
2025-06-16T08:20:15.7508167Z         59: ; %bb.0: ; %entry 
2025-06-16T08:20:15.7509005Z         60:  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) 
2025-06-16T08:20:15.7509683Z         61:  buffer_store_dword v0, off, s[4:7], s4 
2025-06-16T08:20:15.7510378Z         62:  s_mov_b32 s6, 0 
2025-06-16T08:20:15.7511008Z next:67      !~~~~~~~~~~~~~~  error: match on wrong line
2025-06-16T08:20:15.7511636Z         63:  s_mov_b64 s[4:5], 0 
2025-06-16T08:20:15.7512474Z         64: .LBB1_1: ; %loop 
2025-06-16T08:20:15.7513130Z         65:  ; =>This Inner Loop Header: Depth=1 
2025-06-16T08:20:15.7513783Z         66:  s_add_i32 s6, s6, 1 
2025-06-16T08:20:15.7514405Z         67:  v_cmp_le_u32_e32 vcc, s6, v0 
2025-06-16T08:20:15.7515002Z          .
2025-06-16T08:20:15.7515394Z          .
2025-06-16T08:20:15.7515781Z          .
2025-06-16T08:20:15.7516173Z >>>>>>

@WanderingAura WanderingAura force-pushed the wanderingaura/hoist_writeonly_calls branch from 827fa57 to e22d4ab Compare June 16, 2025 18:15
@WanderingAura
Copy link
Contributor Author

WanderingAura commented Jun 16, 2025

It looks like a writeonly function is getting hoisting in CodeGen/AMDGPU/loop_exit_with_xor.ll and I'm not sure if it's incorrect behaviour.

I've assumed that it's correct behaviour and changed the test accordingly. I'd appreciate it if someone knowledgeable about the AMDGPU backend can have a look to check. Thanks in advance!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limitation to argmemonly should not be necessary. I'm dropping the corresponding check for readonly calls in #144497.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add a test with a non-argmemonly call that is hoisted? I guess like neg_not_argmemonly, just without the @load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (CallInst *SCI = dyn_cast<CallInst>(I)) {
} else {
auto *SCI = cast<CallInst>(I));

So this asserts if an unexpected instruction type is passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but we still shouldn't use dyn_cast if there is only one possibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay noted, I've changed it.

@tgymnich
Copy link
Member

It looks like a writeonly function is getting hoisting in CodeGen/AMDGPU/loop_exit_with_xor.ll and I'm not sure if it's incorrect behaviour.

I've assumed that it's correct behaviour and changed the test accordingly. I'd appreciate it if someone knowledgeable about the AMDGPU backend can have a look to check. Thanks in advance!

@WanderingAura llvm.amdgcn.raw.ptr.buffer.store.f32 behaves like a store at a given pointer + offset. So this should be fine.

@WanderingAura WanderingAura force-pushed the wanderingaura/hoist_writeonly_calls branch from e22d4ab to fbe0463 Compare June 18, 2025 18:44
@WanderingAura WanderingAura force-pushed the wanderingaura/hoist_writeonly_calls branch from fbe0463 to 152336d Compare June 18, 2025 20:54
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic nikic merged commit 1ab0e7d into llvm:main Jun 19, 2025
7 checks passed
@github-actions
Copy link

@WanderingAura Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing optimization: loop invariant memset should be hoisted if it's the only memory access in loop

4 participants