[SimplifyCFG] Simplify switch instruction that has duplicate arms #114262

michaelmaitland · 2024-10-30T16:48:49Z

I noticed that the two C functions emitted different IR:

int switch_duplicate_arms(int switch_val, int v, int w) {
  switch (switch_val) {
  default:
    break;
  case 0:
    w = v;
    break;
  case 1:
    w = v;
    break;
  }
  return w;
}

int if_duplicate_arms(int switch_val, int v, int w) {
  if (switch_val == 0)
    w = v;
  else if (switch_val == 1)
    w = v;
  return v0;
}

We generate IR that looks like this:

define i32 @switch_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
  switch i32 %1, label %7 [
    i32 0, label %5
    i32 1, label %6
  ]

5:
  br label %7

6:
  br label %7

7:
  %8 = phi i32 [ %3, %4 ], [ %2, %6 ], [ %2, %5 ]
  ret i32 %8
}

define i32 @if_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
  %5 = icmp ult i32 %1, 2
  %6 = select i1 %5, i32 %2, i32 %3
  ret i32 %6
}

For switch_duplicate_arms, taking case 0 and 1 are the same since %5 and %6
branch to the same location and the incoming values for %8 are the same from
those blocks. We could remove one on the duplicate switch targets and update
the switch with the single target.

On RISC-V, prior to this patch, we generate the following code:

switch_duplicate_arms:
        li      a4, 1
        beq     a1, a4, .LBB0_2
        mv      a0, a3
        bnez    a1, .LBB0_3
.LBB0_2:
        mv      a0, a2
.LBB0_3:
        ret

if_duplicate_arms:
        li      a4, 2
        mv      a0, a2
        bltu    a1, a4, .LBB1_2
        mv      a0, a3
.LBB1_2:
        ret

After this patch, the O3 code is optimized to the icmp + select pair, which
gives us the same code gen as if_duplicate_arms, as desired. This results
is one less branch instruction in the final assembly.

This may help with both code size and further switch simplification. I found
that this patch causes no significant impact to spec2006/int/ref and
spec2017/intrate/ref.

Co-authored-by: Min Hsu [email protected]

llvmbot · 2024-10-30T16:49:14Z

@llvm/pr-subscribers-llvm-transforms

Author: Michael Maitland (michaelmaitland)

Changes

I noticed that the two C functions emitted different IR:

int switch_duplicate_arms(int switch_val, int v, int w) {
  switch (switch_val) {
  default:
    break;
  case 0:
    w = v;
    break;
  case 1:
    w = v;
    break;
  }
  return w;
}

int if_duplicate_arms(int switch_val, int v, int w) {
  if (switch_val == 0)
    w = v;
  else if (switch_val == 1)
    w = v;
  return v0;
}

For switch_duplicate_arms, we generate IR that looks like this:

define i32 @<!-- -->switch_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
  switch i32 %1, label %7 [
    i32 0, label %5
    i32 1, label %6
  ]

5:
  br label %7

6:
  br label %7

7:
  %8 = phi i32 [ %3, %4 ], [ %2, %6 ], [ %2, %5 ]
  ret i32 %8
}

For the equivalent if_duplicate_arms, we generate:

define i32 @<!-- -->if_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
  %5 = icmp ult i32 %1, 2
  %6 = select i1 %5, i32 %2, i32 %3
  ret i32 %6
}

For switch_duplicate_arms, taking case 0 and 1 are the same since %5 and %6
branch to the same location and the incoming values for %8 are the same from
those blocks. We could remove one on the duplicate switch targets and update
the switch with the single target.

On RISC-V, prior to this patch, we generate the following code:

switch_duplicate_arms:
        li      a4, 1
        beq     a1, a4, .LBB0_2
        mv      a0, a3
        bnez    a1, .LBB0_3
.LBB0_2:
        mv      a0, a2
.LBB0_3:
        ret

if_duplicate_arms:
        li      a4, 2
        mv      a0, a2
        bltu    a1, a4, .LBB1_2
        mv      a0, a3
.LBB1_2:
        ret

After this patch, the O3 code is optimized to the icmp + select pair, which
gives us the same code gen as if_duplicate_arms, as desired.

This may help with both code size and further switch simplification. I found
that this patch causes no significant impact to spec2006/int/ref and
spec2017/intrate/ref.

Full diff: https://github.com/llvm/llvm-project/pull/114262.diff

5 Files Affected:

(modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+92)
(modified) llvm/test/Transforms/SimplifyCFG/ForwardSwitchConditionToPHI.ll (+2-4)
(modified) llvm/test/Transforms/SimplifyCFG/HoistCode.ll (+2-4)
(added) llvm/test/Transforms/SimplifyCFG/switch-dup-bbs.ll (+140)
(modified) llvm/test/Transforms/SimplifyCFG/switch-to-select-two-case.ll (+2-4)

diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 72228b445a8b6e..f44364ea507b7b 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -276,6 +276,7 @@ class SimplifyCFGOpt {
   bool simplifyCleanupReturn(CleanupReturnInst *RI);
   bool simplifyUnreachable(UnreachableInst *UI);
   bool simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder);
+  bool simplifyDuplicateSwitchArms(SwitchInst *SI);
   bool simplifyIndirectBr(IndirectBrInst *IBI);
   bool simplifyBranch(BranchInst *Branch, IRBuilder<> &Builder);
   bool simplifyUncondBranch(BranchInst *BI, IRBuilder<> &Builder);
@@ -7436,6 +7437,94 @@ static bool simplifySwitchOfCmpIntrinsic(SwitchInst *SI, IRBuilderBase &Builder,
   return true;
 }
 
+bool SimplifyCFGOpt::simplifyDuplicateSwitchArms(SwitchInst *SI) {
+  // Simplify the case where multiple arms contain only a terminator, the
+  // terminators are the same, and their sucessor PHIS incoming values are the
+  // same.
+
+  // Find BBs that are candidates for simplification.
+  SmallPtrSet<BasicBlock *, 8> BBs;
+  for (auto &Case : SI->cases()) {
+    BasicBlock *BB = Case.getCaseSuccessor();
+
+    // FIXME: This case needs some extra care because the terminators other than
+    // SI need to be updated.
+    if (!BB->hasNPredecessors(1))
+      continue;
+
+    // FIXME: Relax that the terminator is a BranchInst by checking for equality
+    // on other kinds of terminators.
+    Instruction *T = BB->getTerminator();
+    if (T && BB->size() == 1 && isa<BranchInst>(T))
+      BBs.insert(BB);
+  }
+
+  auto IsBranchEq = [](BranchInst *A, BranchInst *B) {
+    if (A->isConditional() != B->isConditional())
+      return false;
+
+    if (A->isConditional() && A->getCondition() != B->getCondition())
+      return false;
+
+    if (A->getNumSuccessors() != B->getNumSuccessors())
+      return false;
+
+    for (unsigned I = 0; I < A->getNumSuccessors(); ++I)
+      if (A->getSuccessor(I) != B->getSuccessor(I))
+        return false;
+
+    // Need to check that PHIs in sucessors have matching values
+    for (auto *Succ : A->successors()) {
+      for (PHINode &Phi : Succ->phis())
+        if (Phi.getIncomingValueForBlock(A->getParent()) !=
+            Phi.getIncomingValueForBlock(B->getParent()))
+          return false;
+    }
+
+    return true;
+  };
+
+  // Construct a map from candidate basic block to an equivalent basic block
+  // to replace it with. All equivalent basic blocks should be replaced with
+  // the same basic block. To do this, if there is no equivalent BB in the map,
+  // then insert into the map BB -> BB. Otherwise, we should check only elements
+  // in the map for equivalence to ensure that all equivalent BB get replaced
+  // by the BB in the map. Replacing BB with BB has no impact, so we skip
+  // a call to setSuccessor when we do the actual replacement.
+  DenseMap<BasicBlock *, BasicBlock *> ReplaceWith;
+  for (BasicBlock *BB : BBs) {
+    bool Inserted = false;
+    for (auto KV : ReplaceWith) {
+      if (IsBranchEq(cast<BranchInst>(BB->getTerminator()),
+                     cast<BranchInst>(KV.first->getTerminator()))) {
+        ReplaceWith[BB] = KV.first;
+        Inserted = true;
+        break;
+      }
+    }
+    if (!Inserted)
+      ReplaceWith[BB] = BB;
+  }
+
+  // Do the replacement in SI.
+  bool MadeChange = false;
+  // There is no fast lookup of BasicBlock -> Cases, so we iterate over cases
+  // and check that the case was a candidate. BBs is already filtered, so
+  // hopefully calling contains on it is not too expensive.
+  for (auto &Case : SI->cases()) {
+    BasicBlock *OldSucc = Case.getCaseSuccessor();
+    if (!BBs.contains(OldSucc))
+      continue;
+    BasicBlock *NewSucc = ReplaceWith[OldSucc];
+    if (OldSucc != NewSucc) {
+      Case.setSuccessor(NewSucc);
+      MadeChange = true;
+    }
+  }
+
+  return MadeChange;
+}
+
 bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {
   BasicBlock *BB = SI->getParent();
 
@@ -7496,6 +7585,9 @@ bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {
       hoistCommonCodeFromSuccessors(SI, !Options.HoistCommonInsts))
     return requestResimplify();
 
+  if (simplifyDuplicateSwitchArms(SI))
+     return requestResimplify();
+
   return false;
 }
 
diff --git a/llvm/test/Transforms/SimplifyCFG/ForwardSwitchConditionToPHI.ll b/llvm/test/Transforms/SimplifyCFG/ForwardSwitchConditionToPHI.ll
index 8ad455eb9e7f22..4623eb2c5dd3c1 100644
--- a/llvm/test/Transforms/SimplifyCFG/ForwardSwitchConditionToPHI.ll
+++ b/llvm/test/Transforms/SimplifyCFG/ForwardSwitchConditionToPHI.ll
@@ -139,16 +139,14 @@ define i32 @PR34471(i32 %x) {
 ; NO_FWD-NEXT:    switch i32 [[X:%.*]], label [[ELSE3:%.*]] [
 ; NO_FWD-NEXT:      i32 17, label [[RETURN:%.*]]
 ; NO_FWD-NEXT:      i32 19, label [[IF19:%.*]]
-; NO_FWD-NEXT:      i32 42, label [[IF42:%.*]]
+; NO_FWD-NEXT:      i32 42, label [[IF19]]
 ; NO_FWD-NEXT:    ]
 ; NO_FWD:       if19:
 ; NO_FWD-NEXT:    br label [[RETURN]]
-; NO_FWD:       if42:
-; NO_FWD-NEXT:    br label [[RETURN]]
 ; NO_FWD:       else3:
 ; NO_FWD-NEXT:    br label [[RETURN]]
 ; NO_FWD:       return:
-; NO_FWD-NEXT:    [[R:%.*]] = phi i32 [ [[X]], [[IF19]] ], [ [[X]], [[IF42]] ], [ 0, [[ELSE3]] ], [ 17, [[ENTRY:%.*]] ]
+; NO_FWD-NEXT:    [[R:%.*]] = phi i32 [ [[X]], [[IF19]] ], [ 0, [[ELSE3]] ], [ 17, [[ENTRY:%.*]] ]
 ; NO_FWD-NEXT:    ret i32 [[R]]
 ;
 ; FWD-LABEL: @PR34471(
diff --git a/llvm/test/Transforms/SimplifyCFG/HoistCode.ll b/llvm/test/Transforms/SimplifyCFG/HoistCode.ll
index fe0b48028a3b62..fbe41d891c1ec5 100644
--- a/llvm/test/Transforms/SimplifyCFG/HoistCode.ll
+++ b/llvm/test/Transforms/SimplifyCFG/HoistCode.ll
@@ -65,14 +65,12 @@ define float @PR39535min_switch(i64 %i, float %x) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    switch i64 [[I:%.*]], label [[END:%.*]] [
 ; CHECK-NEXT:      i64 1, label [[BB1:%.*]]
-; CHECK-NEXT:      i64 2, label [[BB2:%.*]]
+; CHECK-NEXT:      i64 2, label [[BB1]]
 ; CHECK-NEXT:    ]
 ; CHECK:       bb1:
 ; CHECK-NEXT:    br label [[END]]
-; CHECK:       bb2:
-; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[COND:%.*]] = phi fast float [ [[X:%.*]], [[BB1]] ], [ [[X]], [[BB2]] ], [ 0.000000e+00, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[COND:%.*]] = phi fast float [ [[X:%.*]], [[BB1]] ], [ 0.000000e+00, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    ret float [[COND]]
 ;
 entry:
diff --git a/llvm/test/Transforms/SimplifyCFG/switch-dup-bbs.ll b/llvm/test/Transforms/SimplifyCFG/switch-dup-bbs.ll
new file mode 100644
index 00000000000000..b12db656fdf681
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/switch-dup-bbs.ll
@@ -0,0 +1,140 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=simplifycfg -S | FileCheck %s -check-prefix=SIMPLIFY-CFG
+; RUN: opt < %s -O3 -S | FileCheck %s -check-prefix=O3
+
+define i32 @switch_all_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) {
+; SIMPLIFY-CFG-LABEL: define i32 @switch_all_duplicate_arms(
+; SIMPLIFY-CFG-SAME: i32 [[TMP0:%.*]], i32 [[TMP1:%.*]], i32 [[TMP2:%.*]], i32 [[TMP3:%.*]]) {
+; SIMPLIFY-CFG-NEXT:    switch i32 [[TMP1]], label %[[BB6:.*]] [
+; SIMPLIFY-CFG-NEXT:      i32 0, label %[[BB5:.*]]
+; SIMPLIFY-CFG-NEXT:      i32 1, label %[[BB5]]
+; SIMPLIFY-CFG-NEXT:    ]
+; SIMPLIFY-CFG:       [[BB5]]:
+; SIMPLIFY-CFG-NEXT:    br label %[[BB6]]
+; SIMPLIFY-CFG:       [[BB6]]:
+; SIMPLIFY-CFG-NEXT:    [[TMP8:%.*]] = phi i32 [ [[TMP3]], [[TMP4:%.*]] ], [ [[TMP2]], %[[BB5]] ]
+; SIMPLIFY-CFG-NEXT:    ret i32 [[TMP8]]
+;
+; O3-LABEL: define i32 @switch_all_duplicate_arms(
+; O3-SAME: i32 [[TMP0:%.*]], i32 [[TMP1:%.*]], i32 [[TMP2:%.*]], i32 [[TMP3:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; O3-NEXT:    [[SWITCH:%.*]] = icmp ult i32 [[TMP1]], 2
+; O3-NEXT:    [[TMP8:%.*]] = select i1 [[SWITCH]], i32 [[TMP2]], i32 [[TMP3]]
+; O3-NEXT:    ret i32 [[TMP8]]
+;
+  switch i32 %1, label %7 [
+  i32 0, label %5
+  i32 1, label %6
+  ]
+
+5:
+  br label %7
+
+6:
+  br label %7
+
+7:
+  %8 = phi i32 [ %3, %4 ], [ %2, %6 ], [ %2, %5 ]
+  ret i32 %8
+}
+
+define i32 @switch_some_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4) {
+; SIMPLIFY-CFG-LABEL: define i32 @switch_some_duplicate_arms(
+; SIMPLIFY-CFG-SAME: i32 [[TMP0:%.*]], i32 [[TMP1:%.*]], i32 [[TMP2:%.*]], i32 [[TMP3:%.*]], i32 [[TMP4:%.*]]) {
+; SIMPLIFY-CFG-NEXT:    switch i32 [[TMP1]], label %[[BB8:.*]] [
+; SIMPLIFY-CFG-NEXT:      i32 0, label %[[BB6:.*]]
+; SIMPLIFY-CFG-NEXT:      i32 1, label %[[BB6]]
+; SIMPLIFY-CFG-NEXT:      i32 2, label %[[BB7:.*]]
+; SIMPLIFY-CFG-NEXT:    ]
+; SIMPLIFY-CFG:       [[BB6]]:
+; SIMPLIFY-CFG-NEXT:    br label %[[BB8]]
+; SIMPLIFY-CFG:       [[BB7]]:
+; SIMPLIFY-CFG-NEXT:    br label %[[BB8]]
+; SIMPLIFY-CFG:       [[BB8]]:
+; SIMPLIFY-CFG-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP3]], [[TMP5:%.*]] ], [ [[TMP4]], %[[BB7]] ], [ [[TMP2]], %[[BB6]] ]
+; SIMPLIFY-CFG-NEXT:    ret i32 [[TMP10]]
+;
+; O3-LABEL: define i32 @switch_some_duplicate_arms(
+; O3-SAME: i32 [[TMP0:%.*]], i32 [[TMP1:%.*]], i32 [[TMP2:%.*]], i32 [[TMP3:%.*]], i32 [[TMP4:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; O3-NEXT:    switch i32 [[TMP1]], label %[[BB8:.*]] [
+; O3-NEXT:      i32 0, label %[[BB6:.*]]
+; O3-NEXT:      i32 1, label %[[BB6]]
+; O3-NEXT:      i32 2, label %[[BB7:.*]]
+; O3-NEXT:    ]
+; O3:       [[BB6]]:
+; O3-NEXT:    br label %[[BB8]]
+; O3:       [[BB7]]:
+; O3-NEXT:    br label %[[BB8]]
+; O3:       [[BB8]]:
+; O3-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP3]], [[TMP5:%.*]] ], [ [[TMP4]], %[[BB7]] ], [ [[TMP2]], %[[BB6]] ]
+; O3-NEXT:    ret i32 [[TMP10]]
+;
+  switch i32 %1, label %9 [
+  i32 0, label %6
+  i32 1, label %7
+  i32 2, label %8
+  ]
+
+6:
+  br label %9
+
+7:
+  br label %9
+
+8:
+  br label %9
+
+9:
+  %10 = phi i32 [ %3, %5 ], [ %4, %8 ], [ %2, %7 ], [ %2, %6 ]
+  ret i32 %10
+}
+
+define i32 @switch_duplicate_arms_multipred(i1 %0, i32 %1, i32 %2, i32 %3, i32 %4) {
+; SIMPLIFY-CFG-LABEL: define i32 @switch_duplicate_arms_multipred(
+; SIMPLIFY-CFG-SAME: i1 [[TMP0:%.*]], i32 [[TMP1:%.*]], i32 [[TMP2:%.*]], i32 [[TMP3:%.*]], i32 [[TMP4:%.*]]) {
+; SIMPLIFY-CFG-NEXT:    br i1 [[TMP0]], label %[[BB6:.*]], label %[[BB7:.*]]
+; SIMPLIFY-CFG:       [[BB6]]:
+; SIMPLIFY-CFG-NEXT:    switch i32 [[TMP2]], label %[[BB9:.*]] [
+; SIMPLIFY-CFG-NEXT:      i32 0, label %[[BB7]]
+; SIMPLIFY-CFG-NEXT:      i32 1, label %[[BB8:.*]]
+; SIMPLIFY-CFG-NEXT:    ]
+; SIMPLIFY-CFG:       [[BB7]]:
+; SIMPLIFY-CFG-NEXT:    br label %[[BB9]]
+; SIMPLIFY-CFG:       [[BB8]]:
+; SIMPLIFY-CFG-NEXT:    br label %[[BB9]]
+; SIMPLIFY-CFG:       [[BB9]]:
+; SIMPLIFY-CFG-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP4]], %[[BB6]] ], [ [[TMP3]], %[[BB8]] ], [ [[TMP3]], %[[BB7]] ]
+; SIMPLIFY-CFG-NEXT:    ret i32 [[TMP10]]
+;
+; O3-LABEL: define i32 @switch_duplicate_arms_multipred(
+; O3-SAME: i1 [[TMP0:%.*]], i32 [[TMP1:%.*]], i32 [[TMP2:%.*]], i32 [[TMP3:%.*]], i32 [[TMP4:%.*]]) local_unnamed_addr #[[ATTR0]] {
+; O3-NEXT:    br i1 [[TMP0]], label %[[BB6:.*]], label %[[BB7:.*]]
+; O3:       [[BB6]]:
+; O3-NEXT:    switch i32 [[TMP2]], label %[[BB9:.*]] [
+; O3-NEXT:      i32 0, label %[[BB7]]
+; O3-NEXT:      i32 1, label %[[BB8:.*]]
+; O3-NEXT:    ]
+; O3:       [[BB7]]:
+; O3-NEXT:    br label %[[BB9]]
+; O3:       [[BB8]]:
+; O3-NEXT:    br label %[[BB9]]
+; O3:       [[BB9]]:
+; O3-NEXT:    [[TMP10:%.*]] = phi i32 [ [[TMP4]], %[[BB6]] ], [ [[TMP3]], %[[BB8]] ], [ [[TMP3]], %[[BB7]] ]
+; O3-NEXT:    ret i32 [[TMP10]]
+;
+  br i1 %0, label %6, label %7
+6:
+  switch i32 %2, label %9 [
+  i32 0, label %7
+  i32 1, label %8
+  ]
+
+7:
+  br label %9
+
+8:
+  br label %9
+
+9:
+  %10 = phi i32 [ %4, %6 ], [ %3, %8 ], [ %3, %7 ]
+  ret i32 %10
+}
diff --git a/llvm/test/Transforms/SimplifyCFG/switch-to-select-two-case.ll b/llvm/test/Transforms/SimplifyCFG/switch-to-select-two-case.ll
index 1e2f18b3f339d4..50998e447b71dc 100644
--- a/llvm/test/Transforms/SimplifyCFG/switch-to-select-two-case.ll
+++ b/llvm/test/Transforms/SimplifyCFG/switch-to-select-two-case.ll
@@ -272,16 +272,14 @@ define i8 @switch_to_select_two_case_results_no_default(i32 %i) {
 ; CHECK-NEXT:      i32 0, label [[END:%.*]]
 ; CHECK-NEXT:      i32 2, label [[END]]
 ; CHECK-NEXT:      i32 4, label [[CASE3:%.*]]
-; CHECK-NEXT:      i32 6, label [[CASE4:%.*]]
+; CHECK-NEXT:      i32 6, label [[CASE3]]
 ; CHECK-NEXT:    ]
 ; CHECK:       case3:
 ; CHECK-NEXT:    br label [[END]]
-; CHECK:       case4:
-; CHECK-NEXT:    br label [[END]]
 ; CHECK:       default:
 ; CHECK-NEXT:    unreachable
 ; CHECK:       end:
-; CHECK-NEXT:    [[T0:%.*]] = phi i8 [ 44, [[CASE3]] ], [ 44, [[CASE4]] ], [ 42, [[ENTRY:%.*]] ], [ 42, [[ENTRY]] ]
+; CHECK-NEXT:    [[T0:%.*]] = phi i8 [ 44, [[CASE3]] ], [ 42, [[ENTRY:%.*]] ], [ 42, [[ENTRY]] ]
 ; CHECK-NEXT:    ret i8 [[T0]]
 ;
 entry:

github-actions · 2024-10-30T16:52:26Z

✅ With the latest revision this PR passed the C/C++ code formatter.

I noticed that the two C functions emitted different IR: ``` int switch_duplicate_arms(int switch_val, int v, int w) { switch (switch_val) { default: break; case 0: w = v; break; case 1: w = v; break; } return w; } int if_duplicate_arms(int switch_val, int v, int w) { if (switch_val == 0) w = v; else if (switch_val == 1) w = v; return v0; } ``` For `switch_duplicate_arms`, we generate IR that looks like this: ``` define i32 @switch_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) { switch i32 %1, label %7 [ i32 0, label %5 i32 1, label %6 ] 5: br label %7 6: br label %7 7: %8 = phi i32 [ %3, %4 ], [ %2, %6 ], [ %2, %5 ] ret i32 %8 } ``` For the equivalent `if_duplicate_arms`, we generate: ``` define i32 @if_duplicate_arms(i32 %0, i32 %1, i32 %2, i32 %3) { %5 = icmp ult i32 %1, 2 %6 = select i1 %5, i32 %2, i32 %3 ret i32 %6 } ``` For `switch_duplicate_arms`, taking case 0 and 1 are the same since %5 and %6 branch to the same location and the incoming values for %8 are the same from those blocks. We could remove one on the duplicate switch targets and update the switch with the single target. On RISC-V, prior to this patch, we generate the following code: ``` switch_duplicate_arms: li a4, 1 beq a1, a4, .LBB0_2 mv a0, a3 bnez a1, .LBB0_3 .LBB0_2: mv a0, a2 .LBB0_3: ret if_duplicate_arms: li a4, 2 mv a0, a2 bltu a1, a4, .LBB1_2 mv a0, a3 .LBB1_2: ret ``` After this patch, the O3 code is optimized to the icmp + select pair, which gives us the same code gen as `if_duplicate_arms` as desired. This may help with both code size and further switch simplification. I found that this patch causes no significant impact to spec2006/int/ref and spec2017/intrate/ref.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

goldsteinn · 2024-10-30T19:15:06Z

As a general feedback, I think the current implementation is a bit too constrained for my liking, in that it will make it difficult to extend to generally handle equal basic blocks with more than just one branch without entirely rewriting.

I would be more in favor of implementing this by hashing the BBs and arbitrarily de-duplicating equal BBs.

michaelmaitland · 2024-10-30T19:35:53Z

As a general feedback, I think the current implementation is a bit too constrained for my liking, in that it will make it difficult to extend to generally handle equal basic blocks with more than just one branch without entirely rewriting.

I would be more in favor of implementing this by hashing the BBs and arbitrarily de-duplicating equal BBs.

Thanks for the feedback. I'm not sure I agree with the statement that we will need to entirely rewrite to support the general case. I have updated the patch to make that clear. Please let me know what you think.

I think an interesting fact here is that we need to handle the terminator explicitly always because we need to check that the PHINodes in successors (which are defined by the terminator) have the same incoming values from each block. So for that reason, the code that is added to this patch will have to stay in the general case too. In the general case, we just have the additional work of verifying that the work done inside the BB is equivalent. I'd like to leave that work for a future patch.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

goldsteinn · 2024-11-05T19:59:09Z

Just a note, you might be better off using llvm-testsuite as a benchmark while you iterate. Use the CTMark suite.

michaelmaitland · 2024-11-05T20:02:19Z

Just a note, you might be better off using llvm-testsuite as a benchmark while you iterate. Use the CTMark suite.

Thanks for the info. I emailed @dtcxzyw to see if there was a way to run llvm-opt-benchmarks offline, but hadn't heard back yet.

michaelmaitland · 2024-11-05T21:25:21Z

I've fixed the crashes, but still seeing some regressions in compile time. I'm also just about out of ideas for improvements to the code in this patch. I would greatly appreciate any insights from the reviewers on where I can look to improve.

goldsteinn · 2024-11-05T22:08:28Z

Just a note, you might be better off using llvm-testsuite as a benchmark while you iterate. Use the CTMark suite.

Thanks for the info. I emailed @dtcxzyw to see if there was a way to run llvm-opt-benchmarks offline, but hadn't heard back yet.

The normal workflow uses: https://github.com/dtcxzyw/llvm-opt-benchmark/blob/main/scripts/update_optimized.sh

Basically what you are looking for is: scripts/gen_optimized.py bench llvm/llvm-build/bin/opt comptime comptime.log

First go around make a baseline, then apply your patches, rebuild llvm, and re-run. You can creates diffs with: scripts/comptime_diff.py comptime.baseline comptime.log.

Although I would still recommend using CTMark. Its much faster to run.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

dtcxzyw · 2024-11-07T07:09:36Z

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

+/// IncomingValue and add it in the Wrapper so isEqual can do O(1) checking
+/// of the incoming values.
+struct CaseHandleWrapper {
+  const SwitchInst::CaseHandle Case;


I prefer to store BasicBlock *Succ here. Then we can handle the default dest BB as well.

I'm not sure that will work for us, because there is no O(1) way to go from BasicBlock * -> CaseHandles in constant time. We store the CaseHandle so we can do the replacement in CaseHandle quickly.

I see. We can use SwitchInst::getSuccessor/SwitchInst::setSuccessor here. The default dest BB is the first successor.

michaelmaitland · 2024-11-07T20:57:08Z

I am getting the following results in my local machine:

Compilation time result (by files):
Top 5 improvements:
  lightgbm/c_api.cpp.ll 16647754205 -> 16139261311 -3.05%
  faiss/utils.cpp.ll 1544838129 -> 1498931464 -2.97%
  faiss/IndexPQ.cpp.ll 4512055830 -> 4408179250 -2.30%
  gromacs/pme_redistribute.cpp.ll 1325856070 -> 1295687698 -2.28%
  gromacs/kerneldispatch.cpp.ll 614810494 -> 603676278 -1.81%
Top 5 regressions:
  gromacs/mdatoms.cpp.ll 1883643129 -> 1927946180 +2.35%
  lightgbm/metric.cpp.ll 8668505101 -> 8848219689 +2.07%
  lightgbm/linear_tree_learner.cpp.ll 11831124326 -> 12066920031 +1.99%
  meshlab/arap.cpp.ll 20066421231 -> 20418357293 +1.75%
  faiss/Index2Layer.cpp.ll 479186376 -> 487159874 +1.66%

Overall: 0.00054018%

dtcxzyw · 2024-11-08T07:53:04Z

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

+    // replace it.
+    const auto [It, Inserted] = ReplaceWith.insert(&CHW);
+    if (!Inserted) {
+      CHW.Case.setSuccessor((*It)->Case.getCaseSuccessor());


The dominator tree should be updated as well.

dtcxzyw

LGTM. Please wait for additional from other reviewers.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/test/Transforms/SimplifyCFG/switch-dup-bbs.ll

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

nikic · 2024-11-14T16:15:48Z

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

+    // PHIs. Initially, we tried to just use the successor BB as the hash, but
+    // this had poor performance. We find that the extra computation of getting
+    // the incoming PHI values here leads to better performance on overall Set
+    // performance. We also tried to build a map from BB -> Succs.IncomingValues


Suggested change

// PHIs. Initially, we tried to just use the successor BB as the hash, but

// this had poor performance. We find that the extra computation of getting

// the incoming PHI values here leads to better performance on overall Set

// performance. We also tried to build a map from BB -> Succs.IncomingValues

// PHIs. Initially, we tried to just use the successor BB as the hash, but

// including the incoming PHI values leads to better performance.

// We also tried to build a map from BB -> Succs.IncomingValues

There was a lot of performance in here.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

antoniofrighetto · 2024-11-14T17:01:26Z

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

+  // be important to equality though.
+  unsigned SuccNum;
+  BasicBlock *Dest;
+  DenseMap<PHINode *, DenseMap<BasicBlock *, Value *>> *PhiPredIVs;


Maybe the inner mapping can be a SmallDenseMap? I glanced over, but could you confirm the inner map is actually needed? It seems to be prepopulated with values that you can just retrieve from the PN?

I've updated to use a SmallDenseMap on the inner map. I think the inner map is important. In isEqual, we need to check that for two BasicBlock A and B, that the incoming values for each Phi are the same for the two BasicBlocks. That requires us to call PHINode::getIncomingValueForBlock, which is O(|Preds|). If we do not precompute this inner map, then we call PHINode::getIncomingValueForBlock redundantly, since a single BasicBlock may be passed as argument to isEqual multiple times. I've noted this in the docstring of the data structure.

nikic

LGTM

nikic · 2024-11-15T15:51:10Z

I thought it might be beneficial to integrate a block -> index cache directly in PHINode, but at least a naive attempt did not yield good results: https://llvm-compile-time-tracker.com/compare.php?from=e52238b59f250aef5dc0925866d0308305a19dbf&to=bf327b327f172d807b6caa07378e85ede7e83956&stat=instructions:u

jayfoad · 2024-11-16T11:39:01Z

FYI this has made a huge improvement in the time it takes clang -O3 to compile lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp -- more than 2x improvement on trunk, and more than 10x improvement in a downstream branch where the generated AsmParser is larger. So thank you!

[SimplifyCFG] precommit tests for simplify switch with duplicate arms

fbfa46a

michaelmaitland added the llvm:transforms label Oct 30, 2024

michaelmaitland requested review from nikic and topperc October 30, 2024 16:48

michaelmaitland requested a review from mshockwave October 30, 2024 16:52

michaelmaitland force-pushed the simplify-jt-dup-bb branch from eb7d34d to 2db9f00 Compare October 30, 2024 16:53

goldsteinn requested a review from dtcxzyw October 30, 2024 18:41

goldsteinn reviewed Oct 30, 2024

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Oct 30, 2024

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

fixup! respond to review

4e56067

michaelmaitland force-pushed the simplify-jt-dup-bb branch from b15eceb to 4e56067 Compare October 30, 2024 19:13

fixup! refactor for general approach

9e10655

goldsteinn reviewed Oct 30, 2024

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Oct 30, 2024

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

fixup! move PHI checks

64ddee6

michaelmaitland force-pushed the simplify-jt-dup-bb branch from 4b091ef to 64ddee6 Compare October 30, 2024 21:54

This was referenced Oct 31, 2024

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

pre-commit: PR114262 dtcxzyw/llvm-opt-benchmark#1579

Closed

MrLop reviewed Oct 31, 2024

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

fixup! make it O(n)

c8eecb6

michaelmaitland requested review from MrLop and goldsteinn October 31, 2024 17:07

fixup! use successor BBs in hash

9e9afc2

michaelmaitland force-pushed the simplify-jt-dup-bb branch from 8461bae to 9e9afc2 Compare October 31, 2024 17:27

dtcxzyw mentioned this pull request Oct 31, 2024

pre-commit: PR114262 dtcxzyw/llvm-opt-benchmark#1585

Closed

fixup! fix crashes

4949890

dtcxzyw mentioned this pull request Nov 5, 2024

pre-commit: PR114262 dtcxzyw/llvm-opt-benchmark#1631

Closed

dtcxzyw mentioned this pull request Nov 5, 2024

pre-commit: PR114262 dtcxzyw/llvm-opt-benchmark#1632

Closed

fixup~ Revert changes that precompute for getHashValue.

e91253c

michaelmaitland force-pushed the simplify-jt-dup-bb branch from d5bb947 to e91253c Compare November 6, 2024 04:40

dtcxzyw reviewed Nov 7, 2024

View reviewed changes

fixup! use SmallVector instead of vector

d87ea1d

dtcxzyw reviewed Nov 8, 2024

View reviewed changes

michaelmaitland added 2 commits November 11, 2024 14:16

fixup! update dom tree

be55920

fixup! use get/set Sucessor

085b048

michaelmaitland force-pushed the simplify-jt-dup-bb branch from f7a885b to 085b048 Compare November 11, 2024 23:15

fixup! avoid extra insert

3f0b020

dtcxzyw approved these changes Nov 14, 2024

View reviewed changes

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Utils/SimplifyCFG.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/SimplifyCFG/switch-dup-bbs.ll Show resolved Hide resolved

fixup! respond to review

3430a66

nikic reviewed Nov 14, 2024

View reviewed changes

antoniofrighetto reviewed Nov 14, 2024

View reviewed changes

fixup! respond to review

852f3f0

michaelmaitland requested a review from nikic November 14, 2024 17:55

nikic approved these changes Nov 15, 2024

View reviewed changes

nikic merged commit 6b99527 into llvm:main Nov 15, 2024
8 checks passed

michaelmaitland deleted the simplify-jt-dup-bb branch November 15, 2024 15:00

nikic mentioned this pull request Dec 3, 2024

[SimplifyCFG] Missed optimization: merge switch branches into default if they hold the same value in phi as the default branch #85196

Closed

[SimplifyCFG] Simplify switch instruction that has duplicate arms #114262

[SimplifyCFG] Simplify switch instruction that has duplicate arms #114262

Uh oh!

Conversation

michaelmaitland commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 30, 2024

Uh oh!

github-actions bot commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goldsteinn commented Oct 30, 2024

Uh oh!

michaelmaitland commented Oct 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goldsteinn commented Nov 5, 2024

Uh oh!

michaelmaitland commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelmaitland commented Nov 5, 2024

Uh oh!

goldsteinn commented Nov 5, 2024

Uh oh!

Uh oh!

dtcxzyw Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

michaelmaitland Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

dtcxzyw Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

michaelmaitland commented Nov 7, 2024

Uh oh!

dtcxzyw Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikic Nov 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

antoniofrighetto Nov 14, 2024

Choose a reason for hiding this comment

Uh oh!

michaelmaitland Nov 14, 2024

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikic commented Nov 15, 2024

Uh oh!

jayfoad commented Nov 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

michaelmaitland commented Oct 30, 2024 •

edited

Loading

github-actions bot commented Oct 30, 2024 •

edited

Loading

michaelmaitland commented Nov 5, 2024 •

edited

Loading