Skip to content

Conversation

omern1
Copy link
Member

@omern1 omern1 commented May 18, 2025

This patch makes BranchFolding take branch frequency information into
account when creating conditional tailcalls.

It also enables folding fallthrough blocks into conditional tailcalls
when that's profitable.

This should fix #126363.

@llvmbot
Copy link
Member

llvmbot commented May 18, 2025

@llvm/pr-subscribers-backend-x86

Author: Nabeel Omer (omern1)

Changes

This patch makes BranchFolding take branch frequency information into
account when creating conditional tailcalls.

It also enables folding fallthrough blocks into conditional tailcalls
when that's profitable.

This should fix #126363.


Full diff: https://github.com/llvm/llvm-project/pull/140476.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/BranchFolding.cpp (+38-15)
  • (modified) llvm/test/CodeGen/X86/conditional-tailcall.ll (+187)
diff --git a/llvm/lib/CodeGen/BranchFolding.cpp b/llvm/lib/CodeGen/BranchFolding.cpp
index 6f5afbd2a996a..af2c40005081e 100644
--- a/llvm/lib/CodeGen/BranchFolding.cpp
+++ b/llvm/lib/CodeGen/BranchFolding.cpp
@@ -26,6 +26,7 @@
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/BranchFoldingPass.h"
 #include "llvm/CodeGen/MBFIWrapper.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
 #include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
@@ -1547,32 +1548,54 @@ bool BranchFolder::OptimizeBlock(MachineBasicBlock *MBB) {
     MachineInstr &TailCall = *MBB->getFirstNonDebugInstr();
     if (TII->isUnconditionalTailCall(TailCall)) {
       SmallVector<MachineBasicBlock *> PredsChanged;
-      for (auto &Pred : MBB->predecessors()) {
+      for (auto *Pred : MBB->predecessors()) {
+        bool IsPGOInfoAvailable = false;
+        for (MachineBasicBlock *const PredSucc : Pred->successors()) {
+          IsPGOInfoAvailable |= MBPI.isEdgeHot(Pred, PredSucc);
+        }
+
         MachineBasicBlock *PredTBB = nullptr, *PredFBB = nullptr;
         SmallVector<MachineOperand, 4> PredCond;
         bool PredAnalyzable =
             !TII->analyzeBranch(*Pred, PredTBB, PredFBB, PredCond, true);
 
-        // Only eliminate if MBB == TBB (Taken Basic Block)
-        if (PredAnalyzable && !PredCond.empty() && PredTBB == MBB &&
-            PredTBB != PredFBB) {
-          // The predecessor has a conditional branch to this block which
-          // consists of only a tail call. Try to fold the tail call into the
-          // conditional branch.
+        bool IsEdgeCold = !MBPI.isEdgeHot(Pred, MBB);
+        bool CanFoldFallThrough =
+            IsPGOInfoAvailable && IsEdgeCold &&
+            (MBB == PredFBB ||
+             (PredFBB == nullptr && Pred->getFallThrough() == MBB));
+        bool CanFoldTakenBlock =
+            (MBB == PredTBB && (IsPGOInfoAvailable ? IsEdgeCold : true));
+
+        // When we have PGO (or equivalent) information, we want to fold the
+        // fallthrough if it's cold. Folding a fallthrough puts it behind a
+        // conditional branch which isn't desirable if it's hot. When there
+        // isn't any PGO information available we want to fold the taken block
+        // if it's possible and we never want to fold the fallthrough as we
+        // don't know if that is desirable.
+        if (PredAnalyzable && !PredCond.empty() && PredTBB != PredFBB &&
+            (CanFoldTakenBlock || CanFoldFallThrough)) {
+          SmallVector<MachineOperand, 4> ReversedCond(PredCond);
+          if (CanFoldFallThrough) {
+            DebugLoc Dl = MBB->findBranchDebugLoc();
+            TII->reverseBranchCondition(ReversedCond);
+            TII->removeBranch(*Pred);
+            TII->insertBranch(*Pred, MBB, PredTBB, ReversedCond, Dl);
+          }
+          
+          PredAnalyzable =
+              !TII->analyzeBranch(*Pred, PredTBB, PredFBB, PredCond, true);
+
           if (TII->canMakeTailCallConditional(PredCond, TailCall)) {
-            // TODO: It would be nice if analyzeBranch() could provide a pointer
-            // to the branch instruction so replaceBranchWithTailCall() doesn't
-            // have to search for it.
+            // TODO: It would be nice if analyzeBranch() could provide a
+            // pointer to the branch instruction so
+            // replaceBranchWithTailCall() doesn't have to search for it.
             TII->replaceBranchWithTailCall(*Pred, PredCond, TailCall);
             PredsChanged.push_back(Pred);
           }
         }
-        // If the predecessor is falling through to this block, we could reverse
-        // the branch condition and fold the tail call into that. However, after
-        // that we might have to re-arrange the CFG to fall through to the other
-        // block and there is a high risk of regressing code size rather than
-        // improving it.
       }
+
       if (!PredsChanged.empty()) {
         NumTailCalls += PredsChanged.size();
         for (auto &Pred : PredsChanged)
diff --git a/llvm/test/CodeGen/X86/conditional-tailcall.ll b/llvm/test/CodeGen/X86/conditional-tailcall.ll
index 4c990d81810be..b851840e167da 100644
--- a/llvm/test/CodeGen/X86/conditional-tailcall.ll
+++ b/llvm/test/CodeGen/X86/conditional-tailcall.ll
@@ -597,3 +597,190 @@ cleanup.thread:                                   ; preds = %cleanup.thread.loop
   %6 = phi i1 [ %cmp37, %5 ], [ %call34, %if.else28 ], [ false, %cleanup.thread.loopexit ]
   ret i1 %6
 }
+
+define void @true_likely(i1 noundef zeroext %0) {
+; CHECK32-LABEL: true_likely:
+; CHECK32:       # %bb.0:
+; CHECK32-NEXT:    cmpb $0, {{[0-9]+}}(%esp) # encoding: [0x80,0x7c,0x24,0x04,0x00]
+; CHECK32-NEXT:    je func_false # TAILCALL
+; CHECK32-NEXT:    # encoding: [0x74,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+; CHECK32-NEXT:  # %bb.1:
+; CHECK32-NEXT:    jmp func_true # TAILCALL
+; CHECK32-NEXT:    # encoding: [0xeb,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+;
+; CHECK64-LABEL: true_likely:
+; CHECK64:       # %bb.0:
+; CHECK64-NEXT:    testl %edi, %edi # encoding: [0x85,0xff]
+; CHECK64-NEXT:    je func_false # TAILCALL
+; CHECK64-NEXT:    # encoding: [0x74,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+; CHECK64-NEXT:  # %bb.1:
+; CHECK64-NEXT:    jmp func_true # TAILCALL
+; CHECK64-NEXT:    # encoding: [0xeb,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+;
+; WIN64-LABEL: true_likely:
+; WIN64:       # %bb.0:
+; WIN64-NEXT:    testb %cl, %cl # encoding: [0x84,0xc9]
+; WIN64-NEXT:    je func_false # TAILCALL
+; WIN64-NEXT:    # encoding: [0x74,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+; WIN64-NEXT:  # %bb.1:
+; WIN64-NEXT:    jmp func_true # TAILCALL
+; WIN64-NEXT:    # encoding: [0xeb,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+  br i1 %0, label %2, label %3, !prof !6
+
+2:
+  tail call void @func_true()
+  br label %4
+
+3:
+  tail call void @func_false()
+  br label %4
+
+4:
+  ret void
+}
+
+define void @false_likely(i1 noundef zeroext %0) {
+; CHECK32-LABEL: false_likely:
+; CHECK32:       # %bb.0:
+; CHECK32-NEXT:    cmpb $0, {{[0-9]+}}(%esp) # encoding: [0x80,0x7c,0x24,0x04,0x00]
+; CHECK32-NEXT:    jne func_true # TAILCALL
+; CHECK32-NEXT:    # encoding: [0x75,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+; CHECK32-NEXT:  # %bb.1:
+; CHECK32-NEXT:    jmp func_false # TAILCALL
+; CHECK32-NEXT:    # encoding: [0xeb,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+;
+; CHECK64-LABEL: false_likely:
+; CHECK64:       # %bb.0:
+; CHECK64-NEXT:    testl %edi, %edi # encoding: [0x85,0xff]
+; CHECK64-NEXT:    jne func_true # TAILCALL
+; CHECK64-NEXT:    # encoding: [0x75,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+; CHECK64-NEXT:  # %bb.1:
+; CHECK64-NEXT:    jmp func_false # TAILCALL
+; CHECK64-NEXT:    # encoding: [0xeb,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+;
+; WIN64-LABEL: false_likely:
+; WIN64:       # %bb.0:
+; WIN64-NEXT:    testb %cl, %cl # encoding: [0x84,0xc9]
+; WIN64-NEXT:    jne func_true # TAILCALL
+; WIN64-NEXT:    # encoding: [0x75,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+; WIN64-NEXT:  # %bb.1:
+; WIN64-NEXT:    jmp func_false # TAILCALL
+; WIN64-NEXT:    # encoding: [0xeb,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+  br i1 %0, label %2, label %3, !prof !7
+
+2:
+  tail call void @func_true()
+  br label %4
+
+3:
+  tail call void @func_false()
+  br label %4
+
+4:
+  ret void
+}
+
+
+define void @edge_is_hot_but_not_fallthrough(i1 noundef zeroext %0) {
+; CHECK32-LABEL: edge_is_hot_but_not_fallthrough:
+; CHECK32:       # %bb.0:
+; CHECK32-NEXT:    subl $12, %esp # encoding: [0x83,0xec,0x0c]
+; CHECK32-NEXT:    .cfi_def_cfa_offset 16
+; CHECK32-NEXT:    cmpb $0, {{[0-9]+}}(%esp) # encoding: [0x80,0x7c,0x24,0x10,0x00]
+; CHECK32-NEXT:    je .LBB6_1 # encoding: [0x74,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: .LBB6_1-1, kind: FK_PCRel_1
+; CHECK32-NEXT:  # %bb.3:
+; CHECK32-NEXT:    addl $12, %esp # encoding: [0x83,0xc4,0x0c]
+; CHECK32-NEXT:    .cfi_def_cfa_offset 4
+; CHECK32-NEXT:    retl # encoding: [0xc3]
+; CHECK32-NEXT:  .LBB6_1:
+; CHECK32-NEXT:    .cfi_def_cfa_offset 16
+; CHECK32-NEXT:    calll func_true # encoding: [0xe8,A,A,A,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_true-4, kind: FK_PCRel_4
+; CHECK32-NEXT:    .p2align 4
+; CHECK32-NEXT:  .LBB6_2: # =>This Inner Loop Header: Depth=1
+; CHECK32-NEXT:    calll func_false # encoding: [0xe8,A,A,A,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_false-4, kind: FK_PCRel_4
+; CHECK32-NEXT:    jmp .LBB6_2 # encoding: [0xeb,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: .LBB6_2-1, kind: FK_PCRel_1
+;
+; CHECK64-LABEL: edge_is_hot_but_not_fallthrough:
+; CHECK64:       # %bb.0:
+; CHECK64-NEXT:    pushq %rax # encoding: [0x50]
+; CHECK64-NEXT:    .cfi_def_cfa_offset 16
+; CHECK64-NEXT:    testl %edi, %edi # encoding: [0x85,0xff]
+; CHECK64-NEXT:    je .LBB6_1 # encoding: [0x74,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB6_1-1, kind: FK_PCRel_1
+; CHECK64-NEXT:  # %bb.3:
+; CHECK64-NEXT:    popq %rax # encoding: [0x58]
+; CHECK64-NEXT:    .cfi_def_cfa_offset 8
+; CHECK64-NEXT:    retq # encoding: [0xc3]
+; CHECK64-NEXT:  .LBB6_1:
+; CHECK64-NEXT:    .cfi_def_cfa_offset 16
+; CHECK64-NEXT:    callq func_true # encoding: [0xe8,A,A,A,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_true-4, kind: reloc_branch_4byte_pcrel
+; CHECK64-NEXT:    .p2align 4
+; CHECK64-NEXT:  .LBB6_2: # =>This Inner Loop Header: Depth=1
+; CHECK64-NEXT:    callq func_false # encoding: [0xe8,A,A,A,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_false-4, kind: reloc_branch_4byte_pcrel
+; CHECK64-NEXT:    jmp .LBB6_2 # encoding: [0xeb,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB6_2-1, kind: FK_PCRel_1
+;
+; WIN64-LABEL: edge_is_hot_but_not_fallthrough:
+; WIN64:       # %bb.0:
+; WIN64-NEXT:    subq $40, %rsp # encoding: [0x48,0x83,0xec,0x28]
+; WIN64-NEXT:    .seh_stackalloc 40
+; WIN64-NEXT:    .seh_endprologue
+; WIN64-NEXT:    testb %cl, %cl # encoding: [0x84,0xc9]
+; WIN64-NEXT:    je .LBB6_1 # encoding: [0x74,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB6_1-1, kind: FK_PCRel_1
+; WIN64-NEXT:  # %bb.3:
+; WIN64-NEXT:    .seh_startepilogue
+; WIN64-NEXT:    addq $40, %rsp # encoding: [0x48,0x83,0xc4,0x28]
+; WIN64-NEXT:    .seh_endepilogue
+; WIN64-NEXT:    retq # encoding: [0xc3]
+; WIN64-NEXT:  .LBB6_1:
+; WIN64-NEXT:    callq func_true # encoding: [0xe8,A,A,A,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_true-4, kind: reloc_branch_4byte_pcrel
+; WIN64-NEXT:    .p2align 4
+; WIN64-NEXT:  .LBB6_2: # =>This Inner Loop Header: Depth=1
+; WIN64-NEXT:    callq func_false # encoding: [0xe8,A,A,A,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_false-4, kind: reloc_branch_4byte_pcrel
+; WIN64-NEXT:    jmp .LBB6_2 # encoding: [0xeb,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB6_2-1, kind: FK_PCRel_1
+; WIN64-NEXT:    .seh_endproc
+  br i1 %0, label %2, label %3, !prof !6
+2:
+  %and6 = and i1 %0, 1
+  br label %5
+
+3:
+  tail call void @func_true()
+  br label %4
+
+4:
+  tail call void @func_false()
+  br label %4
+
+5:
+  ret void
+}
+
+!6 = !{!"branch_weights", !"expected", i32 2000, i32 1}
+!7 = !{!"branch_weights", !"expected", i32 1, i32 2000}
+
+
+declare dso_local void @func_true()
+declare dso_local void @func_false()

@omern1 omern1 requested review from RKSimon and arsenm May 18, 2025 20:32
Copy link

github-actions bot commented May 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@omern1 omern1 requested a review from goldsteinn May 28, 2025 12:28
@omern1
Copy link
Member Author

omern1 commented May 28, 2025

Ping!

omern1 added 5 commits October 2, 2025 16:33
profitable

This patch makes BranchFolding take branch frequency information into
account when creating conditional tailcalls.

It also enables folding fallthrough blocks into conditional tailcalls
when that's profitable.

This should fix llvm#126363.
@RKSimon
Copy link
Collaborator

RKSimon commented Oct 3, 2025

@omern1 reverse-ping - are you still looking at this?

@omern1
Copy link
Member Author

omern1 commented Oct 3, 2025

Hi @RKSimon, this slipped through the cracks.

Apologies for the force push, I accidentally rebased instead of merging.

Pinging @llvm/pr-subscribers-pgo for reviews.

SmallVector<MachineOperand, 4> ReversedCond(PredCond);
if (CanFoldFallThrough) {
DebugLoc Dl = MBB->findBranchDebugLoc();
TII->reverseBranchCondition(ReversedCond);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the profile (branch weights) in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this @mtrofin.

My understanding is that the weights are still correct because we're eleminating one of the successors so the weights for that just get dropped.

I've updated my patch to ensure that if we get to the point where we reverse the branch we're sure that we're going to form a conditional tailcall.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't the tail call still conditioned on ReversedCond? From what I read in e.g. the @true_likely test , you end up with:

testl %edi, %edi # encoding: [0x85,0xff]
je func_false # TAILCALL
<...fallthrough...>

So there's still a question of what's the probability you jump to func_false vs func_true (the fallthrough). What am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not understanding your concern, could you please clarify further?

With my patch applied we get the following probability out of branch-folder for the true_likely test:

bb.0 (%ir-block.1):
  successors: %bb.1(0x7fef9fcb); %bb.1(99.95%)
...

%bb.1 is the hot fallthrough successor.

The probabilities in the input MIR are:

...
successors: %bb.1(0x7fef9fcb), %bb.2(0x00106035); %bb.1(99.95%), %bb.2(0.05%)
...

This seems correct to me based on the following assumptions:

  • I don't think there's a way of expressing the probability of a tailcall edge.
  • The probability of reaching the successor appears to be correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[[likely]] and [[unlikely]] ignored with jump to tail call

4 participants