- 
                Notifications
    
You must be signed in to change notification settings  - Fork 15.1k
 
[LICM] Hoisting writeonly calls #143799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LICM] Hoisting writeonly calls #143799
Conversation
| 
          
 Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using  If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.  | 
    
| 
          
 @llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-transforms Author: Jiachen (Yangyang) Wang (WanderingAura) ChangesAdds support for hoisting  This patch adds a missing optimization that allows hoisting of  Closes #143267 Testing: 
 Full diff: https://github.com/llvm/llvm-project/pull/143799.diff 2 Files Affected: 
 diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index 7965ed76a81b7..372a79edeb593 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -186,6 +186,9 @@ static bool isSafeToExecuteUnconditionally(
     const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo,
     OptimizationRemarkEmitter *ORE, const Instruction *CtxI,
     AssumptionCache *AC, bool AllowSpeculation);
+static bool memoryDefInvalidatedByLoop(MemorySSA *MSSA, MemoryDef *MD,
+                                       Loop *CurLoop,
+                                       SinkAndHoistLICMFlags &Flags);
 static bool pointerInvalidatedByLoop(MemorySSA *MSSA, MemoryUse *MU,
                                      Loop *CurLoop, Instruction &I,
                                      SinkAndHoistLICMFlags &Flags,
@@ -1032,8 +1035,8 @@ bool llvm::hoistRegion(DomTreeNode *N, AAResults *AA, LoopInfo *LI,
   if (VerifyMemorySSA)
     MSSAU.getMemorySSA()->verifyMemorySSA();
 
-    // Now that we've finished hoisting make sure that LI and DT are still
-    // valid.
+  // Now that we've finished hoisting make sure that LI and DT are still
+  // valid.
 #ifdef EXPENSIVE_CHECKS
   if (Changed) {
     assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
@@ -1258,6 +1261,21 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
         return true;
     }
 
+    if (Behavior.onlyWritesMemory()) {
+      // If it's the only memory access then there is nothing
+      // stopping us from hoisting it.
+      if (isOnlyMemoryAccess(CI, CurLoop, MSSAU))
+        return true;
+
+      if (Behavior.onlyAccessesArgPointees()) {
+        if (memoryDefInvalidatedByLoop(
+                MSSA, cast<MemoryDef>(MSSA->getMemoryAccess(CI)), CurLoop,
+                Flags))
+          return false;
+        return true;
+      }
+    }
+
     // FIXME: This should use mod/ref information to see if we can hoist or
     // sink the call.
 
@@ -1287,7 +1305,7 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
     auto *Source = getClobberingMemoryAccess(*MSSA, BAA, Flags, SIMD);
     // Make sure there are no clobbers inside the loop.
     if (!MSSA->isLiveOnEntryDef(Source) &&
-           CurLoop->contains(Source->getBlock()))
+        CurLoop->contains(Source->getBlock()))
       return false;
 
     // If there are interfering Uses (i.e. their defining access is in the
@@ -1300,7 +1318,7 @@ bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
         for (const auto &MA : *Accesses)
           if (const auto *MU = dyn_cast<MemoryUse>(&MA)) {
             auto *MD = getClobberingMemoryAccess(*MSSA, BAA, Flags,
-                const_cast<MemoryUse *>(MU));
+                                                 const_cast<MemoryUse *>(MU));
             if (!MSSA->isLiveOnEntryDef(MD) &&
                 CurLoop->contains(MD->getBlock()))
               return false;
@@ -1352,7 +1370,7 @@ static bool isTriviallyReplaceablePHI(const PHINode &PN, const Instruction &I) {
 
 /// Return true if the instruction is foldable in the loop.
 static bool isFoldableInLoop(const Instruction &I, const Loop *CurLoop,
-                         const TargetTransformInfo *TTI) {
+                             const TargetTransformInfo *TTI) {
   if (auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {
     InstructionCost CostI =
         TTI->getInstructionCost(&I, TargetTransformInfo::TCK_SizeAndLatency);
@@ -2354,6 +2372,20 @@ collectPromotionCandidates(MemorySSA *MSSA, AliasAnalysis *AA, Loop *L) {
   return Result;
 }
 
+// This function checks if a given MemoryDef gets clobbered by
+// any memory accesses within the loop.
+static bool memoryDefInvalidatedByLoop(MemorySSA *MSSA, MemoryDef *MD,
+                                       Loop *CurLoop,
+                                       SinkAndHoistLICMFlags &Flags) {
+  if (Flags.tooManyMemoryAccesses()) {
+    return true;
+  }
+  BatchAAResults BAA(MSSA->getAA());
+  MemoryAccess *Source = getClobberingMemoryAccess(*MSSA, BAA, Flags, MD);
+  return !MSSA->isLiveOnEntryDef(Source) &&
+         CurLoop->contains(Source->getBlock());
+}
+
 static bool pointerInvalidatedByLoop(MemorySSA *MSSA, MemoryUse *MU,
                                      Loop *CurLoop, Instruction &I,
                                      SinkAndHoistLICMFlags &Flags,
diff --git a/llvm/test/Transforms/LICM/call-hoisting.ll b/llvm/test/Transforms/LICM/call-hoisting.ll
index e6d2e42e34e81..e9b19db4861d4 100644
--- a/llvm/test/Transforms/LICM/call-hoisting.ll
+++ b/llvm/test/Transforms/LICM/call-hoisting.ll
@@ -64,10 +64,13 @@ exit:
 
 declare void @store(i32 %val, ptr %p) argmemonly writeonly nounwind
 
+; loop invariant calls to writeonly functions such as the above
+; should be hoisted
 define void @test(ptr %loc) {
 ; CHECK-LABEL: @test
-; CHECK-LABEL: loop:
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: exit:
 entry:
   br label %loop
@@ -85,8 +88,9 @@ exit:
 
 define void @test_multiexit(ptr %loc, i1 %earlycnd) {
 ; CHECK-LABEL: @test_multiexit
-; CHECK-LABEL: loop:
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: backedge:
 entry:
   br label %loop
@@ -107,6 +111,50 @@ exit2:
   ret void
 }
 
+; cannot be hoisted because the two pointers can alias one another
+define void @neg_two_pointer(ptr %loc, ptr %otherloc) {
+; CHECK-LABEL: @neg_two_pointer
+; CHECK-LABEL: entry:
+; CHECK-LABEL: loop:
+; CHECK: call void @store
+; CHECK: call void @store
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @store(i32 0, ptr %loc)
+  call void @store(i32 1, ptr %otherloc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+exit:
+  ret void
+}
+
+; hoisted due to pointers not aliasing
+define void @two_pointer_noalias(ptr noalias %loc, ptr %otherloc) {
+; CHECK-LABEL: @two_pointer_noalias
+; CHECK-LABEL: entry:
+; CHECK: call void @store
+; CHECK: call void @store
+; CHECK-LABEL: loop:
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @store(i32 0, ptr %loc)
+  call void @store(i32 1, ptr %otherloc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+exit:
+  ret void
+}
+
 define void @neg_lv_value(ptr %loc) {
 ; CHECK-LABEL: @neg_lv_value
 ; CHECK-LABEL: loop:
@@ -166,10 +214,11 @@ exit:
   ret void
 }
 
-define void @neg_ref(ptr %loc) {
-; CHECK-LABEL: @neg_ref
-; CHECK-LABEL: loop:
+define void @ref(ptr %loc) {
+; CHECK-LABEL: @ref
+; CHECK-LABEL: entry:
 ; CHECK: call void @store
+; CHECK-LABEL: loop:
 ; CHECK-LABEL: exit1:
 entry:
   br label %loop
@@ -257,8 +306,31 @@ exit:
   ret void
 }
 
-define void @neg_not_argmemonly(ptr %loc) {
-; CHECK-LABEL: @neg_not_argmemonly
+; not argmemonly calls can be hoisted if it's the only memory access in loop
+define void @not_argmemonly_only_access(ptr %loc) {
+; CHECK-LABEL: @not_argmemonly_only_access
+; CHECK-LABEL: entry:
+; CHECK: call void @not_argmemonly
+; CHECK-LABEL: loop:
+; CHECK-LABEL: exit:
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %loop]
+  call void @not_argmemonly(i32 0, ptr %loc)
+  %iv.next = add i32 %iv, 1
+  %cmp = icmp slt i32 %iv, 200
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret void
+}
+
+; not argmemonly calls cannot be hoisted if there are other memory accesses
+define void @neg_not_argmemonly_multiple_access(ptr noalias %loc, ptr noalias %loc2) {
+; CHECK-LABEL: @neg_not_argmemonly_multiple_access
+; CHECK-LABEL: entry:
 ; CHECK-LABEL: loop:
 ; CHECK: call void @not_argmemonly
 ; CHECK-LABEL: exit:
@@ -268,6 +340,7 @@ entry:
 loop:
   %iv = phi i32 [0, %entry], [%iv.next, %loop]
   call void @not_argmemonly(i32 0, ptr %loc)
+  call void @load(i32 0, ptr %loc2)
   %iv.next = add i32 %iv, 1
   %cmp = icmp slt i32 %iv, 200
   br i1 %cmp, label %loop, label %exit
 | 
    
| 
           Hi, I'm new to contributing to LLVM, so please let me know if there's anything I should do differently, or if any part of this patch could be improved.  | 
    
d87eb45    to
    83b0d30      
    Compare
  
    | 
           It looks like a writeonly function is getting hoisting in CodeGen/AMDGPU/loop_exit_with_xor.ll and I'm not sure if it's incorrect behaviour. llvm-project/llvm/test/CodeGen/AMDGPU/loop_exit_with_xor.ll Lines 59 to 93 in 267b859 
 The call @llvm.amdgcn.raw.ptr.buffer.store.f32(float poison, ptr addrspace(8) poison, i32 0, i32 poison, i32 0) seems to be getting hoisted. Does anyone know if this is legal behaviour? Below is the failed CI logs:  | 
    
827fa57    to
    e22d4ab      
    Compare
  
    
          
 I've assumed that it's correct behaviour and changed the test accordingly. I'd appreciate it if someone knowledgeable about the AMDGPU backend can have a look to check. Thanks in advance!  | 
    
        
          
                llvm/lib/Transforms/Scalar/LICM.cpp
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limitation to argmemonly should not be necessary. I'm dropping the corresponding check for readonly calls in #144497.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add a test with a non-argmemonly call that is hoisted? I guess like neg_not_argmemonly, just without the @load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
        
          
                llvm/lib/Transforms/Scalar/LICM.cpp
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } else if (CallInst *SCI = dyn_cast<CallInst>(I)) { | |
| } else { | |
| auto *SCI = cast<CallInst>(I)); | 
So this asserts if an unexpected instruction type is passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently asserted at the start of the function:
https://github.com/llvm/llvm-project/pull/143799/files/e22d4abf8faf85453ba098b44f460216e20f8ac0#diff-0c00cf873bac7f969a29a53b732c8ff3968a20d6fb1713130929da1044503ebbR2324
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but we still shouldn't use dyn_cast if there is only one possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay noted, I've changed it.
          
 @WanderingAura   | 
    
e22d4ab    to
    fbe0463      
    Compare
  
    fbe0463    to
    152336d      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| 
          
 @WanderingAura Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done!  | 
    
Adds support for hoisting
writeonlycalls in LICM.This patch adds a missing optimization that allows hoisting of
writeonlyfunction calls out of loops when it is safe to do so. Previously, such calls were conservatively retained inside the loop body, and the redundant calls were only reduced through unrolling, relying on target-dependent heuristics.Closes #143267
Testing: