[BranchFolding] Fold fallthroughs into conditional tailcalls if profitable #140476

omern1 · 2025-05-18T20:31:12Z

This patch makes BranchFolding take branch frequency information into
account when creating conditional tailcalls.

It also enables folding fallthrough blocks into conditional tailcalls
when that's profitable.

This should fix #126363.

llvmbot · 2025-05-18T20:31:49Z

@llvm/pr-subscribers-backend-x86

Author: Nabeel Omer (omern1)

Changes

This patch makes BranchFolding take branch frequency information into
account when creating conditional tailcalls.

It also enables folding fallthrough blocks into conditional tailcalls
when that's profitable.

This should fix #126363.

Full diff: https://github.com/llvm/llvm-project/pull/140476.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/BranchFolding.cpp (+38-15)
(modified) llvm/test/CodeGen/X86/conditional-tailcall.ll (+187)

diff --git a/llvm/lib/CodeGen/BranchFolding.cpp b/llvm/lib/CodeGen/BranchFolding.cpp
index 6f5afbd2a996a..af2c40005081e 100644
--- a/llvm/lib/CodeGen/BranchFolding.cpp
+++ b/llvm/lib/CodeGen/BranchFolding.cpp
@@ -26,6 +26,7 @@
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/BranchFoldingPass.h"
 #include "llvm/CodeGen/MBFIWrapper.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
 #include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
@@ -1547,32 +1548,54 @@ bool BranchFolder::OptimizeBlock(MachineBasicBlock *MBB) {
     MachineInstr &TailCall = *MBB->getFirstNonDebugInstr();
     if (TII->isUnconditionalTailCall(TailCall)) {
       SmallVector<MachineBasicBlock *> PredsChanged;
-      for (auto &Pred : MBB->predecessors()) {
+      for (auto *Pred : MBB->predecessors()) {
+        bool IsPGOInfoAvailable = false;
+        for (MachineBasicBlock *const PredSucc : Pred->successors()) {
+          IsPGOInfoAvailable |= MBPI.isEdgeHot(Pred, PredSucc);
+        }
+
         MachineBasicBlock *PredTBB = nullptr, *PredFBB = nullptr;
         SmallVector<MachineOperand, 4> PredCond;
         bool PredAnalyzable =
             !TII->analyzeBranch(*Pred, PredTBB, PredFBB, PredCond, true);
 
-        // Only eliminate if MBB == TBB (Taken Basic Block)
-        if (PredAnalyzable && !PredCond.empty() && PredTBB == MBB &&
-            PredTBB != PredFBB) {
-          // The predecessor has a conditional branch to this block which
-          // consists of only a tail call. Try to fold the tail call into the
-          // conditional branch.
+        bool IsEdgeCold = !MBPI.isEdgeHot(Pred, MBB);
+        bool CanFoldFallThrough =
+            IsPGOInfoAvailable && IsEdgeCold &&
+            (MBB == PredFBB ||
+             (PredFBB == nullptr && Pred->getFallThrough() == MBB));
+        bool CanFoldTakenBlock =
+            (MBB == PredTBB && (IsPGOInfoAvailable ? IsEdgeCold : true));
+
+        // When we have PGO (or equivalent) information, we want to fold the
+        // fallthrough if it's cold. Folding a fallthrough puts it behind a
+        // conditional branch which isn't desirable if it's hot. When there
+        // isn't any PGO information available we want to fold the taken block
+        // if it's possible and we never want to fold the fallthrough as we
+        // don't know if that is desirable.
+        if (PredAnalyzable && !PredCond.empty() && PredTBB != PredFBB &&
+            (CanFoldTakenBlock || CanFoldFallThrough)) {
+          SmallVector<MachineOperand, 4> ReversedCond(PredCond);
+          if (CanFoldFallThrough) {
+            DebugLoc Dl = MBB->findBranchDebugLoc();
+            TII->reverseBranchCondition(ReversedCond);
+            TII->removeBranch(*Pred);
+            TII->insertBranch(*Pred, MBB, PredTBB, ReversedCond, Dl);
+          }
+          
+          PredAnalyzable =
+              !TII->analyzeBranch(*Pred, PredTBB, PredFBB, PredCond, true);
+
           if (TII->canMakeTailCallConditional(PredCond, TailCall)) {
-            // TODO: It would be nice if analyzeBranch() could provide a pointer
-            // to the branch instruction so replaceBranchWithTailCall() doesn't
-            // have to search for it.
+            // TODO: It would be nice if analyzeBranch() could provide a
+            // pointer to the branch instruction so
+            // replaceBranchWithTailCall() doesn't have to search for it.
             TII->replaceBranchWithTailCall(*Pred, PredCond, TailCall);
             PredsChanged.push_back(Pred);
           }
         }
-        // If the predecessor is falling through to this block, we could reverse
-        // the branch condition and fold the tail call into that. However, after
-        // that we might have to re-arrange the CFG to fall through to the other
-        // block and there is a high risk of regressing code size rather than
-        // improving it.
       }
+
       if (!PredsChanged.empty()) {
         NumTailCalls += PredsChanged.size();
         for (auto &Pred : PredsChanged)
diff --git a/llvm/test/CodeGen/X86/conditional-tailcall.ll b/llvm/test/CodeGen/X86/conditional-tailcall.ll
index 4c990d81810be..b851840e167da 100644
--- a/llvm/test/CodeGen/X86/conditional-tailcall.ll
+++ b/llvm/test/CodeGen/X86/conditional-tailcall.ll
@@ -597,3 +597,190 @@ cleanup.thread:                                   ; preds = %cleanup.thread.loop
   %6 = phi i1 [ %cmp37, %5 ], [ %call34, %if.else28 ], [ false, %cleanup.thread.loopexit ]
   ret i1 %6
 }
+
+define void @true_likely(i1 noundef zeroext %0) {
+; CHECK32-LABEL: true_likely:
+; CHECK32:       # %bb.0:
+; CHECK32-NEXT:    cmpb $0, {{[0-9]+}}(%esp) # encoding: [0x80,0x7c,0x24,0x04,0x00]
+; CHECK32-NEXT:    je func_false # TAILCALL
+; CHECK32-NEXT:    # encoding: [0x74,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+; CHECK32-NEXT:  # %bb.1:
+; CHECK32-NEXT:    jmp func_true # TAILCALL
+; CHECK32-NEXT:    # encoding: [0xeb,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+;
+; CHECK64-LABEL: true_likely:
+; CHECK64:       # %bb.0:
+; CHECK64-NEXT:    testl %edi, %edi # encoding: [0x85,0xff]
+; CHECK64-NEXT:    je func_false # TAILCALL
+; CHECK64-NEXT:    # encoding: [0x74,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+; CHECK64-NEXT:  # %bb.1:
+; CHECK64-NEXT:    jmp func_true # TAILCALL
+; CHECK64-NEXT:    # encoding: [0xeb,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+;
+; WIN64-LABEL: true_likely:
+; WIN64:       # %bb.0:
+; WIN64-NEXT:    testb %cl, %cl # encoding: [0x84,0xc9]
+; WIN64-NEXT:    je func_false # TAILCALL
+; WIN64-NEXT:    # encoding: [0x74,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+; WIN64-NEXT:  # %bb.1:
+; WIN64-NEXT:    jmp func_true # TAILCALL
+; WIN64-NEXT:    # encoding: [0xeb,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+  br i1 %0, label %2, label %3, !prof !6
+
+2:
+  tail call void @func_true()
+  br label %4
+
+3:
+  tail call void @func_false()
+  br label %4
+
+4:
+  ret void
+}
+
+define void @false_likely(i1 noundef zeroext %0) {
+; CHECK32-LABEL: false_likely:
+; CHECK32:       # %bb.0:
+; CHECK32-NEXT:    cmpb $0, {{[0-9]+}}(%esp) # encoding: [0x80,0x7c,0x24,0x04,0x00]
+; CHECK32-NEXT:    jne func_true # TAILCALL
+; CHECK32-NEXT:    # encoding: [0x75,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+; CHECK32-NEXT:  # %bb.1:
+; CHECK32-NEXT:    jmp func_false # TAILCALL
+; CHECK32-NEXT:    # encoding: [0xeb,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+;
+; CHECK64-LABEL: false_likely:
+; CHECK64:       # %bb.0:
+; CHECK64-NEXT:    testl %edi, %edi # encoding: [0x85,0xff]
+; CHECK64-NEXT:    jne func_true # TAILCALL
+; CHECK64-NEXT:    # encoding: [0x75,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+; CHECK64-NEXT:  # %bb.1:
+; CHECK64-NEXT:    jmp func_false # TAILCALL
+; CHECK64-NEXT:    # encoding: [0xeb,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+;
+; WIN64-LABEL: false_likely:
+; WIN64:       # %bb.0:
+; WIN64-NEXT:    testb %cl, %cl # encoding: [0x84,0xc9]
+; WIN64-NEXT:    jne func_true # TAILCALL
+; WIN64-NEXT:    # encoding: [0x75,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_true-1, kind: FK_PCRel_1
+; WIN64-NEXT:  # %bb.1:
+; WIN64-NEXT:    jmp func_false # TAILCALL
+; WIN64-NEXT:    # encoding: [0xeb,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_false-1, kind: FK_PCRel_1
+  br i1 %0, label %2, label %3, !prof !7
+
+2:
+  tail call void @func_true()
+  br label %4
+
+3:
+  tail call void @func_false()
+  br label %4
+
+4:
+  ret void
+}
+
+
+define void @edge_is_hot_but_not_fallthrough(i1 noundef zeroext %0) {
+; CHECK32-LABEL: edge_is_hot_but_not_fallthrough:
+; CHECK32:       # %bb.0:
+; CHECK32-NEXT:    subl $12, %esp # encoding: [0x83,0xec,0x0c]
+; CHECK32-NEXT:    .cfi_def_cfa_offset 16
+; CHECK32-NEXT:    cmpb $0, {{[0-9]+}}(%esp) # encoding: [0x80,0x7c,0x24,0x10,0x00]
+; CHECK32-NEXT:    je .LBB6_1 # encoding: [0x74,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: .LBB6_1-1, kind: FK_PCRel_1
+; CHECK32-NEXT:  # %bb.3:
+; CHECK32-NEXT:    addl $12, %esp # encoding: [0x83,0xc4,0x0c]
+; CHECK32-NEXT:    .cfi_def_cfa_offset 4
+; CHECK32-NEXT:    retl # encoding: [0xc3]
+; CHECK32-NEXT:  .LBB6_1:
+; CHECK32-NEXT:    .cfi_def_cfa_offset 16
+; CHECK32-NEXT:    calll func_true # encoding: [0xe8,A,A,A,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_true-4, kind: FK_PCRel_4
+; CHECK32-NEXT:    .p2align 4
+; CHECK32-NEXT:  .LBB6_2: # =>This Inner Loop Header: Depth=1
+; CHECK32-NEXT:    calll func_false # encoding: [0xe8,A,A,A,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: func_false-4, kind: FK_PCRel_4
+; CHECK32-NEXT:    jmp .LBB6_2 # encoding: [0xeb,A]
+; CHECK32-NEXT:    # fixup A - offset: 1, value: .LBB6_2-1, kind: FK_PCRel_1
+;
+; CHECK64-LABEL: edge_is_hot_but_not_fallthrough:
+; CHECK64:       # %bb.0:
+; CHECK64-NEXT:    pushq %rax # encoding: [0x50]
+; CHECK64-NEXT:    .cfi_def_cfa_offset 16
+; CHECK64-NEXT:    testl %edi, %edi # encoding: [0x85,0xff]
+; CHECK64-NEXT:    je .LBB6_1 # encoding: [0x74,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB6_1-1, kind: FK_PCRel_1
+; CHECK64-NEXT:  # %bb.3:
+; CHECK64-NEXT:    popq %rax # encoding: [0x58]
+; CHECK64-NEXT:    .cfi_def_cfa_offset 8
+; CHECK64-NEXT:    retq # encoding: [0xc3]
+; CHECK64-NEXT:  .LBB6_1:
+; CHECK64-NEXT:    .cfi_def_cfa_offset 16
+; CHECK64-NEXT:    callq func_true # encoding: [0xe8,A,A,A,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_true-4, kind: reloc_branch_4byte_pcrel
+; CHECK64-NEXT:    .p2align 4
+; CHECK64-NEXT:  .LBB6_2: # =>This Inner Loop Header: Depth=1
+; CHECK64-NEXT:    callq func_false # encoding: [0xe8,A,A,A,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: func_false-4, kind: reloc_branch_4byte_pcrel
+; CHECK64-NEXT:    jmp .LBB6_2 # encoding: [0xeb,A]
+; CHECK64-NEXT:    # fixup A - offset: 1, value: .LBB6_2-1, kind: FK_PCRel_1
+;
+; WIN64-LABEL: edge_is_hot_but_not_fallthrough:
+; WIN64:       # %bb.0:
+; WIN64-NEXT:    subq $40, %rsp # encoding: [0x48,0x83,0xec,0x28]
+; WIN64-NEXT:    .seh_stackalloc 40
+; WIN64-NEXT:    .seh_endprologue
+; WIN64-NEXT:    testb %cl, %cl # encoding: [0x84,0xc9]
+; WIN64-NEXT:    je .LBB6_1 # encoding: [0x74,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB6_1-1, kind: FK_PCRel_1
+; WIN64-NEXT:  # %bb.3:
+; WIN64-NEXT:    .seh_startepilogue
+; WIN64-NEXT:    addq $40, %rsp # encoding: [0x48,0x83,0xc4,0x28]
+; WIN64-NEXT:    .seh_endepilogue
+; WIN64-NEXT:    retq # encoding: [0xc3]
+; WIN64-NEXT:  .LBB6_1:
+; WIN64-NEXT:    callq func_true # encoding: [0xe8,A,A,A,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_true-4, kind: reloc_branch_4byte_pcrel
+; WIN64-NEXT:    .p2align 4
+; WIN64-NEXT:  .LBB6_2: # =>This Inner Loop Header: Depth=1
+; WIN64-NEXT:    callq func_false # encoding: [0xe8,A,A,A,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: func_false-4, kind: reloc_branch_4byte_pcrel
+; WIN64-NEXT:    jmp .LBB6_2 # encoding: [0xeb,A]
+; WIN64-NEXT:    # fixup A - offset: 1, value: .LBB6_2-1, kind: FK_PCRel_1
+; WIN64-NEXT:    .seh_endproc
+  br i1 %0, label %2, label %3, !prof !6
+2:
+  %and6 = and i1 %0, 1
+  br label %5
+
+3:
+  tail call void @func_true()
+  br label %4
+
+4:
+  tail call void @func_false()
+  br label %4
+
+5:
+  ret void
+}
+
+!6 = !{!"branch_weights", !"expected", i32 2000, i32 1}
+!7 = !{!"branch_weights", !"expected", i32 1, i32 2000}
+
+
+declare dso_local void @func_true()
+declare dso_local void @func_false()

github-actions · 2025-05-18T20:33:43Z

✅ With the latest revision this PR passed the C/C++ code formatter.

omern1 · 2025-05-28T12:28:46Z

Ping!

profitable This patch makes BranchFolding take branch frequency information into account when creating conditional tailcalls. It also enables folding fallthrough blocks into conditional tailcalls when that's profitable. This should fix llvm#126363.

RKSimon · 2025-10-03T13:40:15Z

@omern1 reverse-ping - are you still looking at this?

omern1 · 2025-10-03T16:15:45Z

Hi @RKSimon, this slipped through the cracks.

Apologies for the force push, I accidentally rebased instead of merging.

Pinging @llvm/pr-subscribers-pgo for reviews.

mtrofin · 2025-10-04T02:11:14Z

llvm/lib/CodeGen/BranchFolding.cpp

+          SmallVector<MachineOperand, 4> ReversedCond(PredCond);
+          if (CanFoldFallThrough) {
+            DebugLoc Dl = MBB->findBranchDebugLoc();
+            TII->reverseBranchCondition(ReversedCond);


what happens to the profile (branch weights) in this case?

Thanks for looking at this @mtrofin.

My understanding is that the weights are still correct because we're eleminating one of the successors so the weights for that just get dropped.

I've updated my patch to ensure that if we get to the point where we reverse the branch we're sure that we're going to form a conditional tailcall.

But isn't the tail call still conditioned on ReversedCond? From what I read in e.g. the @true_likely test , you end up with:

testl %edi, %edi # encoding: [0x85,0xff] je func_false # TAILCALL <...fallthrough...>

So there's still a question of what's the probability you jump to func_false vs func_true (the fallthrough). What am I missing?

I think I'm not understanding your concern, could you please clarify further?

With my patch applied we get the following probability out of branch-folder for the true_likely test:

bb.0 (%ir-block.1): successors: %bb.1(0x7fef9fcb); %bb.1(99.95%) ...

%bb.1 is the hot fallthrough successor.

The probabilities in the input MIR are:

... successors: %bb.1(0x7fef9fcb), %bb.2(0x00106035); %bb.1(99.95%), %bb.2(0.05%) ...

This seems correct to me based on the following assumptions:

I don't think there's a way of expressing the probability of a tailcall edge.

The probability of reaching the successor appears to be correct.

llvmbot added the backend:X86 label May 18, 2025

omern1 requested review from RKSimon and arsenm May 18, 2025 20:32

omern1 requested a review from goldsteinn May 28, 2025 12:28

omern1 added 5 commits October 2, 2025 16:33

Fix formatting

4f292c1

Update tests

19f7fd4

Update name

15526b9

Update tests

e9db1c8

Update test to address unrelated changes

8a4c2bf

omern1 force-pushed the omern1/condtailcalls branch from 0287464 to 8a4c2bf Compare October 3, 2025 16:05

llvmbot added the llvm:codegen label Oct 3, 2025

mtrofin reviewed Oct 4, 2025

View reviewed changes

omern1 and others added 2 commits October 7, 2025 14:28

Ensure tailcall forming before reversing branch

a267468

Merge branch 'main' into omern1/condtailcalls

9aad503

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BranchFolding] Fold fallthroughs into conditional tailcalls if profitable #140476

[BranchFolding] Fold fallthroughs into conditional tailcalls if profitable #140476

Uh oh!

omern1 commented May 18, 2025

Uh oh!

llvmbot commented May 18, 2025

Uh oh!

github-actions bot commented May 18, 2025 •

edited

Loading

Uh oh!

omern1 commented May 28, 2025

Uh oh!

RKSimon commented Oct 3, 2025

Uh oh!

omern1 commented Oct 3, 2025

Uh oh!

mtrofin Oct 4, 2025

Uh oh!

omern1 Oct 7, 2025

Uh oh!

mtrofin Oct 7, 2025

Uh oh!

omern1 Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[BranchFolding] Fold fallthroughs into conditional tailcalls if profitable #140476

Are you sure you want to change the base?

[BranchFolding] Fold fallthroughs into conditional tailcalls if profitable #140476

Uh oh!

Conversation

omern1 commented May 18, 2025

Uh oh!

llvmbot commented May 18, 2025

Uh oh!

github-actions bot commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omern1 commented May 28, 2025

Uh oh!

RKSimon commented Oct 3, 2025

Uh oh!

omern1 commented Oct 3, 2025

Uh oh!

mtrofin Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

omern1 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

mtrofin Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

omern1 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented May 18, 2025 •

edited

Loading