[LICM] Hoisting writeonly calls #143799

WanderingAura · 2025-06-11T22:49:15Z

Adds support for hoisting writeonly calls in LICM.

This patch adds a missing optimization that allows hoisting of writeonly function calls out of loops when it is safe to do so. Previously, such calls were conservatively retained inside the loop body, and the redundant calls were only reduced through unrolling, relying on target-dependent heuristics.

Closes #143267

Testing:

Modified previously negative tests for hoisting writeonly calls to be instead positive
Added test cases for hoisting of two writeonly calls where the pointers do/do not alias
Added a test case for not argmemonly writeonly calls.

github-actions · 2025-06-11T22:49:33Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-06-11T22:50:03Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-transforms

Author: Jiachen (Yangyang) Wang (WanderingAura)

Changes

Adds support for hoisting writeonly calls in LICM.

This patch adds a missing optimization that allows hoisting of writeonly function calls out of loops when it is safe to do so. Previously, such calls were conservatively retained inside the loop body, and the redundant calls were only reduced through unrolling, relying on target-dependent heuristics.

Closes #143267

Testing:

Modified previously negative tests for hoisting writeonly calls to be instead positive
Added test cases for hoisting of two writeonly calls where the pointers do/do not alias
Added a test case for not argmemonly writeonly calls.

Full diff: https://github.com/llvm/llvm-project/pull/143799.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LICM.cpp (+37-5)
(modified) llvm/test/Transforms/LICM/call-hoisting.ll (+80-7)

diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index 7965ed76a81b7..372a79edeb593 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -186,6 +186,9 @@ static bool isSafeToExecuteUnconditionally(
     const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo,
     OptimizationRemarkEmitter *ORE, const Instruction *CtxI,
     AssumptionCache *AC, bool AllowSpeculation);
+static bool memoryDefInvalidatedByLoop(MemorySSA *MSSA, MemoryDef *MD,
+                                       Loop *CurLoop,
+                                       SinkAndHoistLICMFlags &Flags);
 static bool pointerInvalidatedByLoop(MemorySSA *MSSA, MemoryUse *MU,
                                      Loop *CurLoop, Instruction &I,
                                      SinkAndHoistLICMFlags &Flags,
@@ -1032,8 +1035,8 @@ bool llvm::hoistRegion(DomTreeNode *N, AAResults *AA, LoopInfo *LI,
   if (VerifyMemorySSA)
     MSSAU.getMemorySSA()->verifyMemorySSA();
 
-    // Now that we've finished hoisting make sure that LI and DT are still
-    // valid.
+  // Now that we've finished hoisting make sure that LI and DT are still
+  // valid.
 #ifdef EXPENSIVE_CHECKS
   if (Changed) {
     assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
@@ -1258,6 +1261,21 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
         return true;
     }
 
+    if (Behavior.onlyWritesMemory()) {
+      // If it's the only memory access then there is nothing
+      // stopping us from hoisting it.
+      if (isOnlyMemoryAccess(CI, CurLoop, MSSAU))
+        return true;
+
+      if (Behavior.onlyAccessesArgPointees()) {
+        if (memoryDefInvalidatedByLoop(
+                MSSA, cast<MemoryDef>(MSSA->getMemoryAccess(CI)), CurLoop,
+                Flags))
+          return false;
+        return true;
+      }
+    }
+
     // FIXME: This should use mod/ref information to see if we can hoist or
     // sink the call.
 
@@ -1287,7 +1305,7 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
     auto *Source = getClobberingMemoryAccess(*MSSA, BAA, Flags, SIMD);
     // Make sure there are no clobbers inside the loop.
     if (!MSSA->isLiveOnEntryDef(Source) &&
-           CurLoop->contains(Source->getBlock()))
+        CurLoop->contains(Source->getBlock()))
       return false;
 
     // If there are interfering Uses (i.e. their defining access is in the
@@ -1300,7 +1318,7 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
         for (const auto &MA : *Accesses)
           if (const auto *MU = dyn_cast<MemoryUse>(&MA)) {
             auto *MD = getClobberingMemoryAccess(*MSSA, BAA, Flags,
-                const_cast<MemoryUse *>(MU));
+                                                 const_cast<MemoryUse *>(MU));
             if (!MSSA->isLiveOnEntryDef(MD) &&
                 CurLoop->contains(MD->getBlock()))
               return false;
@@ -1352,7 +1370,7 @@ static bool isTriviallyReplaceablePHI(const PHINode &PN, const Instruction &I) {
 
 /// Return true if the instruction is foldable in the loop.
 static bool isFoldableInLoop(const Instruction &I, const Loop *CurLoop,
-                         const TargetTransformInfo *TTI) {
+                             const TargetTransformInfo *TTI) {
   if (auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {
     InstructionCost CostI =
         TTI->getInstructionCost(&I, TargetTransformInfo::TCK_SizeAndLatency);
@@ -2354,6 +2372,20 @@ collectPromotionCandidates(MemorySSA *MSSA, AliasAnalysis *AA, Loop *L) {
   return Result;
 }
 
+// This function checks if a given MemoryDef gets clobbered by
+// any memory accesses within the loop.
+static bool memoryDefInvalidatedByLoop(MemorySSA *MSSA, MemoryDef *MD,
+                                       Loop *CurLoop,
+                                       SinkAndHoistLICMFlags &Flags) {
+  if (Flags.tooManyMemoryAccesses()) {
+    return true;
+  }
+  BatchAAResults BAA(MSSA->getAA());
+  MemoryAccess *Source = getClobberingMemoryAccess(*MSSA, BAA, Flags, MD);
+  return !MSSA->isLiveOnEntryDef(Source) &&
+         CurLoop->contains(Source->getBlock());
+}
+
 static bool pointerInvalidatedByLoop(MemorySSA *MSSA, MemoryUse *MU,
                                      Loop *CurLoop, Instruction &I,
                                      SinkAndHoistLICMFlags &Flags,
diff --git a/llvm/test/Transforms/LICM/call-hoisting.ll b/llvm/test/Transforms/LICM/call-hoisting.ll
index e6d2e42e34e81..e9b19db4861d4 100644
--- a/llvm/test/Transforms/LICM/call-hoisting.ll
+++ b/llvm/test/Transforms/LICM/call-hoisting.ll
@@ -64,10 +64,13 @@ exit:
 
 declare void @store(i32 %val, ptr %p) argmemonly writeonly nounwind
 
+; loop invariant calls to writeonly functions such as the above
+; should be hoisted
 define void @test(ptr %loc) {
 ; CHECK-LABEL: @test
-; CHECK-LABEL: loop:
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: exit:
 entry:
   br label %loop
@@ -85,8 +88,9 @@ exit:
 
 define void @test_multiexit(ptr %loc, i1 %earlycnd) {
 ; CHECK-LABEL: @test_multiexit
-; CHECK-LABEL: loop:
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: backedge:
 entry:
   br label %loop
@@ -107,6 +111,50 @@ exit2:
   ret void
 }
 
+; cannot be hoisted because the two pointers can alias one another
+define void @neg_two_pointer(ptr %loc, ptr %otherloc) {
+; CHECK-LABEL: @neg_two_pointer
+; CHECK-LABEL: entry:
+; CHECK-LABEL: loop:
+; CHECK: call void @store
+; CHECK: call void @store
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @store(i32 0, ptr %loc)
+  call void @store(i32 1, ptr %otherloc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+exit:
+  ret void
+}
+
+; hoisted due to pointers not aliasing
+define void @two_pointer_noalias(ptr noalias %loc, ptr %otherloc) {
+; CHECK-LABEL: @two_pointer_noalias
+; CHECK-LABEL: entry:
+; CHECK: call void @store
+; CHECK: call void @store
+; CHECK-LABEL: loop:
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @store(i32 0, ptr %loc)
+  call void @store(i32 1, ptr %otherloc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+exit:
+  ret void
+}
+
 define void @neg_lv_value(ptr %loc) {
 ; CHECK-LABEL: @neg_lv_value
 ; CHECK-LABEL: loop:
@@ -166,10 +214,11 @@ exit:
   ret void
 }
 
-define void @neg_ref(ptr %loc) {
-; CHECK-LABEL: @neg_ref
-; CHECK-LABEL: loop:
+define void @ref(ptr %loc) {
+; CHECK-LABEL: @ref
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: exit1:
 entry:
   br label %loop
@@ -257,8 +306,31 @@ exit:
   ret void
 }
 
-define void @neg_not_argmemonly(ptr %loc) {
-; CHECK-LABEL: @neg_not_argmemonly
+; not argmemonly calls can be hoisted if it's the only memory access in loop
+define void @not_argmemonly_only_access(ptr %loc) {
+; CHECK-LABEL: @not_argmemonly_only_access
+; CHECK-LABEL: entry:
+; CHECK: call void @not_argmemonly
+; CHECK-LABEL: loop:
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @not_argmemonly(i32 0, ptr %loc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret void
+}
+
+; not argmemonly calls cannot be hoisted if there are other memory accesses
+define void @neg_not_argmemonly_multiple_access(ptr noalias %loc, ptr noalias %loc2) {
+; CHECK-LABEL: @neg_not_argmemonly_multiple_access
+; CHECK-LABEL: entry:
 ; CHECK-LABEL: loop:
 ; CHECK: call void @not_argmemonly
 ; CHECK-LABEL: exit:
@@ -268,6 +340,7 @@ entry:
 loop:
   %iv = phi i32 [0, %entry], [%iv.next, %loop]
   call void @not_argmemonly(i32 0, ptr %loc)
+  call void @load(i32 0, ptr %loc2)
   %iv.next = add i32 %iv, 1
   %cmp = icmp slt i32 %iv, 200
   br i1 %cmp, label %loop, label %exit

WanderingAura · 2025-06-11T22:50:30Z

Hi, I'm new to contributing to LLVM, so please let me know if there's anything I should do differently, or if any part of this patch could be improved.

llvm/lib/Transforms/Scalar/LICM.cpp

llvm/test/Transforms/LICM/call-hoisting.ll

llvm/lib/Transforms/Scalar/LICM.cpp

WanderingAura · 2025-06-16T18:02:03Z

It looks like a writeonly function is getting hoisting in CodeGen/AMDGPU/loop_exit_with_xor.ll and I'm not sure if it's incorrect behaviour.

llvm-project/llvm/test/CodeGen/AMDGPU/loop_exit_with_xor.ll

Lines 59 to 93 in 267b859

    
           ; Where the mask of lanes wanting to exit the loop on this iteration is 
        
           ; obviously already masked by exec (a V_CMP), then lower control flow can omit 
        
           ; the S_AND_B64 to avoid an unnecessary instruction. 
        
           define void @doesnt_need_and(i32 %arg) { 
        
           ; GCN-LABEL: doesnt_need_and: 
        
           ; GCN:       ; %bb.0: ; %entry 
        
           ; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) 
        
           ; GCN-NEXT:    s_mov_b32 s6, 0 
        
           ; GCN-NEXT:    s_mov_b64 s[4:5], 0 
        
           ; GCN-NEXT:  .LBB1_1: ; %loop 
        
           ; GCN-NEXT:    ; =>This Inner Loop Header: Depth=1 
        
           ; GCN-NEXT:    s_add_i32 s6, s6, 1 
        
           ; GCN-NEXT:    v_cmp_le_u32_e32 vcc, s6, v0 
        
           ; GCN-NEXT:    s_or_b64 s[4:5], vcc, s[4:5] 
        
           ; GCN-NEXT:    buffer_store_dword v0, off, s[4:7], s4 
        
           ; GCN-NEXT:    s_andn2_b64 exec, exec, s[4:5] 
        
           ; GCN-NEXT:    s_cbranch_execnz .LBB1_1 
        
           ; GCN-NEXT:  ; %bb.2: ; %loopexit 
        
           ; GCN-NEXT:    s_or_b64 exec, exec, s[4:5] 
        
           ; GCN-NEXT:    s_waitcnt vmcnt(0) 
        
           ; GCN-NEXT:    s_setpc_b64 s[30:31] 
        
           entry: 
        
             br label %loop 
        
           loop: 
        
             %tmp23phi = phi i32 [ %tmp23, %loop ], [ 0, %entry ] 
        
             %tmp23 = add nuw i32 %tmp23phi, 1 
        
             %tmp27 = icmp ult i32 %arg, %tmp23 
        
             call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float poison, ptr addrspace(8) poison, i32 0, i32 poison, i32 0) 
        
             br i1 %tmp27, label %loop, label %loopexit 
        
           loopexit: 
        
             ret void 
        
           }

The call @llvm.amdgcn.raw.ptr.buffer.store.f32(float poison, ptr addrspace(8) poison, i32 0, i32 poison, i32 0) seems to be getting hoisted. Does anyone know if this is legal behaviour?

Below is the failed CI logs:

2025-06-16T08:20:15.7502444Z Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AMDGPU/loop_exit_with_xor.ll
2025-06-16T08:20:15.7503881Z 
2025-06-16T08:20:15.7504165Z -dump-input=help explains the following input dump.
2025-06-16T08:20:15.7504657Z 
2025-06-16T08:20:15.7504821Z Input was:
2025-06-16T08:20:15.7505247Z <<<<<<
2025-06-16T08:20:15.7505624Z          .
2025-06-16T08:20:15.7506041Z          .
2025-06-16T08:20:15.7506423Z          .
2025-06-16T08:20:15.7506833Z         57:  .type doesnt_need_and,@function 
2025-06-16T08:20:15.7507493Z         58: doesnt_need_and: ; @doesnt_need_and 
2025-06-16T08:20:15.7508167Z         59: ; %bb.0: ; %entry 
2025-06-16T08:20:15.7509005Z         60:  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) 
2025-06-16T08:20:15.7509683Z         61:  buffer_store_dword v0, off, s[4:7], s4 
2025-06-16T08:20:15.7510378Z         62:  s_mov_b32 s6, 0 
2025-06-16T08:20:15.7511008Z next:67      !~~~~~~~~~~~~~~  error: match on wrong line
2025-06-16T08:20:15.7511636Z         63:  s_mov_b64 s[4:5], 0 
2025-06-16T08:20:15.7512474Z         64: .LBB1_1: ; %loop 
2025-06-16T08:20:15.7513130Z         65:  ; =>This Inner Loop Header: Depth=1 
2025-06-16T08:20:15.7513783Z         66:  s_add_i32 s6, s6, 1 
2025-06-16T08:20:15.7514405Z         67:  v_cmp_le_u32_e32 vcc, s6, v0 
2025-06-16T08:20:15.7515002Z          .
2025-06-16T08:20:15.7515394Z          .
2025-06-16T08:20:15.7515781Z          .
2025-06-16T08:20:15.7516173Z >>>>>>

WanderingAura · 2025-06-16T18:19:10Z

It looks like a writeonly function is getting hoisting in CodeGen/AMDGPU/loop_exit_with_xor.ll and I'm not sure if it's incorrect behaviour.

I've assumed that it's correct behaviour and changed the test accordingly. I'd appreciate it if someone knowledgeable about the AMDGPU backend can have a look to check. Thanks in advance!

nikic · 2025-06-17T10:11:13Z

llvm/lib/Transforms/Scalar/LICM.cpp

The limitation to argmemonly should not be necessary. I'm dropping the corresponding check for readonly calls in #144497.

Could you please also add a test with a non-argmemonly call that is hoisted? I guess like neg_not_argmemonly, just without the @load.

Sure, done.

nikic · 2025-06-17T10:13:58Z

llvm/lib/Transforms/Scalar/LICM.cpp

Suggested change

} else if (CallInst *SCI = dyn_cast<CallInst>(I)) {

} else {

auto *SCI = cast<CallInst>(I));

So this asserts if an unexpected instruction type is passed.

This is currently asserted at the start of the function:
https://github.com/llvm/llvm-project/pull/143799/files/e22d4abf8faf85453ba098b44f460216e20f8ac0#diff-0c00cf873bac7f969a29a53b732c8ff3968a20d6fb1713130929da1044503ebbR2324

Right, but we still shouldn't use dyn_cast if there is only one possibility.

Okay noted, I've changed it.

tgymnich · 2025-06-17T11:51:22Z

It looks like a writeonly function is getting hoisting in CodeGen/AMDGPU/loop_exit_with_xor.ll and I'm not sure if it's incorrect behaviour.

I've assumed that it's correct behaviour and changed the test accordingly. I'd appreciate it if someone knowledgeable about the AMDGPU backend can have a look to check. Thanks in advance!

@WanderingAura llvm.amdgcn.raw.ptr.buffer.store.f32 behaves like a store at a given pointer + offset. So this should be fine.

nikic

LGTM

github-actions · 2025-06-19T08:09:35Z

@WanderingAura Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvmbot added the llvm:transforms label Jun 11, 2025

tgymnich requested a review from nikic June 12, 2025 09:37

nikic changed the title ~~[llvm] Hoisting writeonly calls~~ [LICM] Hoisting writeonly calls Jun 12, 2025

nikic reviewed Jun 12, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/LICM.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/LICM/call-hoisting.ll Outdated Show resolved Hide resolved

llvm/lib/Transforms/Scalar/LICM.cpp Outdated Show resolved Hide resolved

WanderingAura force-pushed the wanderingaura/hoist_writeonly_calls branch 3 times, most recently from d87eb45 to 83b0d30 Compare June 15, 2025 16:57

tgymnich reviewed Jun 16, 2025

View reviewed changes

llvm/lib/Transforms/Scalar/LICM.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Scalar/LICM.cpp Outdated Show resolved Hide resolved

WanderingAura force-pushed the wanderingaura/hoist_writeonly_calls branch from 827fa57 to e22d4ab Compare June 16, 2025 18:15

llvmbot added the backend:AMDGPU label Jun 16, 2025

nikic reviewed Jun 17, 2025

View reviewed changes

WanderingAura force-pushed the wanderingaura/hoist_writeonly_calls branch from e22d4ab to fbe0463 Compare June 18, 2025 18:44

Hoist writeonly calls

152336d

WanderingAura force-pushed the wanderingaura/hoist_writeonly_calls branch from fbe0463 to 152336d Compare June 18, 2025 20:54

nikic approved these changes Jun 19, 2025

View reviewed changes

nikic merged commit 1ab0e7d into llvm:main Jun 19, 2025
7 checks passed

	} else if (CallInst *SCI = dyn_cast<CallInst>(I)) {
	} else {
	auto *SCI = cast<CallInst>(I));

[LICM] Hoisting writeonly calls #143799

[LICM] Hoisting writeonly calls #143799

Uh oh!

Conversation

WanderingAura commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

llvmbot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WanderingAura commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WanderingAura commented Jun 16, 2025

Uh oh!

WanderingAura commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

WanderingAura Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

WanderingAura Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

WanderingAura Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

tgymnich commented Jun 17, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Jun 11, 2025 •

edited

Loading

WanderingAura commented Jun 16, 2025 •

edited

Loading