Skip to content

Conversation

@ChunyuLiao
Copy link
Member

Some data types that require extension have redundant copy instructions. This pass removes specific copies to help shrink-wrap optimization.

@ChunyuLiao ChunyuLiao requested a review from preames May 20, 2025 11:58
@ChunyuLiao ChunyuLiao requested a review from topperc May 20, 2025 11:58
@llvmbot
Copy link
Member

llvmbot commented May 20, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Liao Chunyu (ChunyuLiao)

Changes

Some data types that require extension have redundant copy instructions. This pass removes specific copies to help shrink-wrap optimization.


Full diff: https://github.com/llvm/llvm-project/pull/140716.diff

8 Files Affected:

  • (modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/RISCV/RISCV.h (+2)
  • (added) llvm/lib/Target/RISCV/RISCVCopyCombine.cpp (+150)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+4)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1)
  • (modified) llvm/test/CodeGen/RISCV/overflow-intrinsics.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/rv64-double-convert.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/shrinkwrap.ll (+5-4)
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index e32d6eab3b977..dee700291571f 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -35,6 +35,7 @@ add_llvm_target(RISCVCodeGen
   RISCVCodeGenPrepare.cpp
   RISCVConstantPoolValue.cpp
   RISCVDeadRegisterDefinitions.cpp
+  RISCVCopyCombine.cpp
   RISCVExpandAtomicPseudoInsts.cpp
   RISCVExpandPseudoInsts.cpp
   RISCVFoldMemOffset.cpp
diff --git a/llvm/lib/Target/RISCV/RISCV.h b/llvm/lib/Target/RISCV/RISCV.h
index ae9410193efe1..d5e55bc60224a 100644
--- a/llvm/lib/Target/RISCV/RISCV.h
+++ b/llvm/lib/Target/RISCV/RISCV.h
@@ -30,6 +30,8 @@ void initializeRISCVCodeGenPreparePass(PassRegistry &);
 
 FunctionPass *createRISCVDeadRegisterDefinitionsPass();
 void initializeRISCVDeadRegisterDefinitionsPass(PassRegistry &);
+FunctionPass *createRISCVCopyCombinePass();
+void initializeRISCVCopyCombinePass(PassRegistry &);
 
 FunctionPass *createRISCVIndirectBranchTrackingPass();
 void initializeRISCVIndirectBranchTrackingPass(PassRegistry &);
diff --git a/llvm/lib/Target/RISCV/RISCVCopyCombine.cpp b/llvm/lib/Target/RISCV/RISCVCopyCombine.cpp
new file mode 100644
index 0000000000000..0312f5921d505
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVCopyCombine.cpp
@@ -0,0 +1,150 @@
+//===- RISCVCopyCombine.cpp - Remove special copy for RISC-V --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------------===//
+//
+// This pass attempts a shrink-wrap optimization for special cases, which is
+// effective when data types require extension.
+//
+// After finalize-isel:
+//   bb0:
+//   liveins: $x10, $x11
+//     %1:gpr = COPY $x11   ---- will be delete in this pass
+//     %0:gpr = COPY $x10
+//     %2:gpr = COPY %1:gpr ---- without this pass, sink to bb1 in machine-sink,
+//                               then delete at regalloc
+//     BEQ %0:gpr, killed %3:gpr, %bb.3 PseudoBR %bb1
+//
+//   bb1:
+//   bb2:
+//     BNE %2:gpr, killed %5:gpr, %bb.2
+// ...
+// After regalloc
+//  bb0:
+//    liveins: $x10, $x11
+//    renamable $x8 = COPY $x11
+//    renamable $x11 = ADDI $x0, 57 --- def x11, so COPY can not be sink
+//    BEQ killed renamable $x10, killed renamable $x11, %bb.4
+//    PseudoBR %bb.1
+//
+//  bb1:
+//  bb2:
+//    BEQ killed renamable $x8, killed renamable $x10, %bb.4
+//
+// ----->
+//
+// After this pass:
+//   bb0:
+//   liveins: $x10, $x11
+//     %0:gpr = COPY $x10
+//     %2:gpr = COPY $x11
+//     BEQ %0:gpr, killed %3:gpr, %bb.3
+//     PseudoBR %bb1
+//
+//   bb1:
+//   bb2:
+//     BNE %2:gpr, killed %5:gpr, %bb.2
+// ...
+// After regalloc
+//  bb0:
+//    liveins: $x10, $x11
+//    renamable $x12 = ADDI $x0, 57
+//    renamable $x8 = COPY $x11
+//    BEQ killed renamable $x10, killed renamable $x11, %bb.4
+//    PseudoBR %bb.1
+//
+//  bb1:
+//  bb2:
+//    BEQ killed renamable $x8, killed renamable $x10, %bb.4
+//===---------------------------------------------------------------------===//
+
+#include "RISCV.h"
+#include "RISCVSubtarget.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+
+using namespace llvm;
+#define DEBUG_TYPE "riscv-copy-combine"
+#define RISCV_COPY_COMBINE "RISC-V Copy Combine"
+
+STATISTIC(NumCopyDeleted, "Number of copy deleted");
+
+namespace {
+class RISCVCopyCombine : public MachineFunctionPass {
+public:
+  static char ID;
+  const TargetInstrInfo *TII;
+  MachineRegisterInfo *MRI;
+  const TargetRegisterInfo *TRI;
+  const RISCVSubtarget *ST;
+
+  RISCVCopyCombine() : MachineFunctionPass(ID) {}
+  bool runOnMachineFunction(MachineFunction &MF) override;
+  MachineFunctionProperties getRequiredProperties() const override {
+    return MachineFunctionProperties().set(
+        MachineFunctionProperties::Property::IsSSA);
+  }
+
+  StringRef getPassName() const override { return RISCV_COPY_COMBINE; }
+
+private:
+  bool optimizeBlock(MachineBasicBlock &MBB);
+};
+} // end anonymous namespace
+
+char RISCVCopyCombine::ID = 0;
+INITIALIZE_PASS(RISCVCopyCombine, DEBUG_TYPE, RISCV_COPY_COMBINE, false, false)
+
+bool RISCVCopyCombine::optimizeBlock(MachineBasicBlock &MBB) {
+  MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
+  SmallVector<MachineOperand, 3> Cond;
+  if (TII->analyzeBranch(MBB, TBB, FBB, Cond, /*AllowModify*/ false) ||
+      Cond.empty())
+    return false;
+
+  if (!TBB || Cond.size() != 3)
+    return false;
+
+  MachineInstr *MI = MRI->getVRegDef(Cond[1].getReg());
+  if (MI->getOpcode() == RISCV::COPY) {
+    if (MRI->hasOneUse(MI->getOperand(1).getReg()) &&
+        MI->getOperand(1).getReg().isVirtual() &&
+        MI->getOperand(0).getReg().isVirtual()) {
+      MachineInstr *Src = MRI->getVRegDef(MI->getOperand(1).getReg());
+      if (Src && Src->getOpcode() == RISCV::COPY &&
+          Src->getParent() == MI->getParent()) {
+        MRI->replaceRegWith(MI->getOperand(1).getReg(),
+                            Src->getOperand(1).getReg());
+        LLVM_DEBUG(dbgs() << "Deleting this copy instruction ";
+                   Src->print(dbgs()));
+        ++NumCopyDeleted;
+        Src->eraseFromParent();
+        return true;
+      }
+    }
+  }
+  return false;
+}
+
+bool RISCVCopyCombine::runOnMachineFunction(MachineFunction &MF) {
+  if (skipFunction(MF.getFunction()))
+    return false;
+
+  TII = MF.getSubtarget().getInstrInfo();
+  ;
+  MRI = &MF.getRegInfo();
+  TRI = MRI->getTargetRegisterInfo();
+
+  bool Changed = false;
+  for (MachineBasicBlock &MBB : MF)
+    Changed |= optimizeBlock(MBB);
+
+  return Changed;
+}
+
+FunctionPass *llvm::createRISCVCopyCombinePass() {
+  return new RISCVCopyCombine();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 15dd4d57727dd..2aaa9b1771c7b 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -128,6 +128,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
   initializeRISCVPostLegalizerCombinerPass(*PR);
   initializeKCFIPass(*PR);
   initializeRISCVDeadRegisterDefinitionsPass(*PR);
+  initializeRISCVCopyCombinePass(*PR);
   initializeRISCVLateBranchOptPass(*PR);
   initializeRISCVMakeCompressibleOptPass(*PR);
   initializeRISCVGatherScatterLoweringPass(*PR);
@@ -455,6 +456,7 @@ bool RISCVPassConfig::addRegAssignAndRewriteFast() {
   if (TM->getOptLevel() != CodeGenOptLevel::None &&
       EnableRISCVDeadRegisterElimination)
     addPass(createRISCVDeadRegisterDefinitionsPass());
+
   return TargetPassConfig::addRegAssignAndRewriteFast();
 }
 
@@ -465,6 +467,7 @@ bool RISCVPassConfig::addRegAssignAndRewriteOptimized() {
   if (TM->getOptLevel() != CodeGenOptLevel::None &&
       EnableRISCVDeadRegisterElimination)
     addPass(createRISCVDeadRegisterDefinitionsPass());
+
   return TargetPassConfig::addRegAssignAndRewriteOptimized();
 }
 
@@ -598,6 +601,7 @@ void RISCVPassConfig::addPreEmitPass2() {
 }
 
 void RISCVPassConfig::addMachineSSAOptimization() {
+  addPass(createRISCVCopyCombinePass());
   addPass(createRISCVVectorPeepholePass());
   addPass(createRISCVFoldMemOffsetPass());
 
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 19de864422bc5..7ebe888b7cc52 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -96,6 +96,7 @@
 ; CHECK-NEXT:       Lazy Block Frequency Analysis
 ; CHECK-NEXT:       RISC-V DAG->DAG Pattern Instruction Selection
 ; CHECK-NEXT:       Finalize ISel and expand pseudo-instructions
+; CHECK-NEXT:       RISC-V Copy Combine
 ; CHECK-NEXT:       RISC-V Vector Peephole Optimization
 ; CHECK-NEXT:       RISC-V Fold Memory Offset
 ; CHECK-NEXT:       Lazy Machine Block Frequency Analysis
diff --git a/llvm/test/CodeGen/RISCV/overflow-intrinsics.ll b/llvm/test/CodeGen/RISCV/overflow-intrinsics.ll
index a5426e560bd65..714e8d1cb74c0 100644
--- a/llvm/test/CodeGen/RISCV/overflow-intrinsics.ll
+++ b/llvm/test/CodeGen/RISCV/overflow-intrinsics.ll
@@ -1080,33 +1080,33 @@ define i1 @usubo_ult_cmp_dominates_i64(i64 %x, i64 %y, ptr %p, i1 %cond) {
 ; RV32-NEXT:    .cfi_offset s5, -28
 ; RV32-NEXT:    .cfi_offset s6, -32
 ; RV32-NEXT:    mv s5, a5
-; RV32-NEXT:    mv s3, a1
-; RV32-NEXT:    andi a1, a5, 1
-; RV32-NEXT:    beqz a1, .LBB32_8
+; RV32-NEXT:    mv s2, a0
+; RV32-NEXT:    andi a0, a5, 1
+; RV32-NEXT:    beqz a0, .LBB32_8
 ; RV32-NEXT:  # %bb.1: # %t
 ; RV32-NEXT:    mv s0, a4
-; RV32-NEXT:    mv s2, a3
+; RV32-NEXT:    mv s3, a3
 ; RV32-NEXT:    mv s1, a2
-; RV32-NEXT:    mv s4, a0
-; RV32-NEXT:    beq s3, a3, .LBB32_3
+; RV32-NEXT:    mv s4, a1
+; RV32-NEXT:    beq a1, a3, .LBB32_3
 ; RV32-NEXT:  # %bb.2: # %t
-; RV32-NEXT:    sltu s6, s3, s2
+; RV32-NEXT:    sltu s6, s4, s3
 ; RV32-NEXT:    j .LBB32_4
 ; RV32-NEXT:  .LBB32_3:
-; RV32-NEXT:    sltu s6, s4, s1
+; RV32-NEXT:    sltu s6, s2, s1
 ; RV32-NEXT:  .LBB32_4: # %t
 ; RV32-NEXT:    mv a0, s6
 ; RV32-NEXT:    call call
 ; RV32-NEXT:    beqz s6, .LBB32_8
 ; RV32-NEXT:  # %bb.5: # %end
-; RV32-NEXT:    sltu a1, s4, s1
+; RV32-NEXT:    sltu a1, s2, s1
 ; RV32-NEXT:    mv a0, a1
-; RV32-NEXT:    beq s3, s2, .LBB32_7
+; RV32-NEXT:    beq s4, s3, .LBB32_7
 ; RV32-NEXT:  # %bb.6: # %end
-; RV32-NEXT:    sltu a0, s3, s2
+; RV32-NEXT:    sltu a0, s4, s3
 ; RV32-NEXT:  .LBB32_7: # %end
-; RV32-NEXT:    sub a2, s3, s2
-; RV32-NEXT:    sub a3, s4, s1
+; RV32-NEXT:    sub a2, s4, s3
+; RV32-NEXT:    sub a3, s2, s1
 ; RV32-NEXT:    sub a2, a2, a1
 ; RV32-NEXT:    sw a3, 0(s0)
 ; RV32-NEXT:    sw a2, 4(s0)
diff --git a/llvm/test/CodeGen/RISCV/rv64-double-convert.ll b/llvm/test/CodeGen/RISCV/rv64-double-convert.ll
index dd49d9e3e2dce..f9fd528584169 100644
--- a/llvm/test/CodeGen/RISCV/rv64-double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/rv64-double-convert.ll
@@ -69,14 +69,14 @@ define i128 @fptosi_sat_f64_to_i128(double %a) nounwind {
 ; RV64I-NEXT:    sd s3, 24(sp) # 8-byte Folded Spill
 ; RV64I-NEXT:    sd s4, 16(sp) # 8-byte Folded Spill
 ; RV64I-NEXT:    sd s5, 8(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    mv s0, a0
+; RV64I-NEXT:    mv s1, a0
 ; RV64I-NEXT:    li a1, -449
 ; RV64I-NEXT:    slli a1, a1, 53
 ; RV64I-NEXT:    call __gedf2
 ; RV64I-NEXT:    mv s2, a0
-; RV64I-NEXT:    mv a0, s0
+; RV64I-NEXT:    mv a0, s1
 ; RV64I-NEXT:    call __fixdfti
-; RV64I-NEXT:    mv s1, a0
+; RV64I-NEXT:    mv s0, a0
 ; RV64I-NEXT:    mv s3, a1
 ; RV64I-NEXT:    li s5, -1
 ; RV64I-NEXT:    bgez s2, .LBB4_2
@@ -86,15 +86,15 @@ define i128 @fptosi_sat_f64_to_i128(double %a) nounwind {
 ; RV64I-NEXT:    li a0, 575
 ; RV64I-NEXT:    slli a0, a0, 53
 ; RV64I-NEXT:    addi a1, a0, -1
-; RV64I-NEXT:    mv a0, s0
+; RV64I-NEXT:    mv a0, s1
 ; RV64I-NEXT:    call __gtdf2
 ; RV64I-NEXT:    mv s4, a0
 ; RV64I-NEXT:    blez a0, .LBB4_4
 ; RV64I-NEXT:  # %bb.3:
 ; RV64I-NEXT:    srli s3, s5, 1
 ; RV64I-NEXT:  .LBB4_4:
-; RV64I-NEXT:    mv a0, s0
-; RV64I-NEXT:    mv a1, s0
+; RV64I-NEXT:    mv a0, s1
+; RV64I-NEXT:    mv a1, s1
 ; RV64I-NEXT:    call __unorddf2
 ; RV64I-NEXT:    snez a0, a0
 ; RV64I-NEXT:    slti a1, s2, 0
@@ -102,7 +102,7 @@ define i128 @fptosi_sat_f64_to_i128(double %a) nounwind {
 ; RV64I-NEXT:    addi a0, a0, -1
 ; RV64I-NEXT:    addi a3, a1, -1
 ; RV64I-NEXT:    and a1, a0, s3
-; RV64I-NEXT:    and a3, a3, s1
+; RV64I-NEXT:    and a3, a3, s0
 ; RV64I-NEXT:    neg a2, a2
 ; RV64I-NEXT:    or a2, a2, a3
 ; RV64I-NEXT:    and a0, a0, a2
diff --git a/llvm/test/CodeGen/RISCV/shrinkwrap.ll b/llvm/test/CodeGen/RISCV/shrinkwrap.ll
index 90f9509c72373..235c714f7f33b 100644
--- a/llvm/test/CodeGen/RISCV/shrinkwrap.ll
+++ b/llvm/test/CodeGen/RISCV/shrinkwrap.ll
@@ -361,6 +361,9 @@ define void @li_straightline_b(i32 zeroext %a, i32 zeroext %b) {
 ;
 ; RV64I-SW-LABEL: li_straightline_b:
 ; RV64I-SW:       # %bb.0:
+; RV64I-SW-NEXT:    li a2, 57
+; RV64I-SW-NEXT:    beq a0, a2, .LBB3_4
+; RV64I-SW-NEXT:  # %bb.1: # %do_call
 ; RV64I-SW-NEXT:    addi sp, sp, -16
 ; RV64I-SW-NEXT:    .cfi_def_cfa_offset 16
 ; RV64I-SW-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
@@ -368,21 +371,19 @@ define void @li_straightline_b(i32 zeroext %a, i32 zeroext %b) {
 ; RV64I-SW-NEXT:    .cfi_offset ra, -8
 ; RV64I-SW-NEXT:    .cfi_offset s0, -16
 ; RV64I-SW-NEXT:    mv s0, a1
-; RV64I-SW-NEXT:    li a1, 57
-; RV64I-SW-NEXT:    beq a0, a1, .LBB3_3
-; RV64I-SW-NEXT:  # %bb.1: # %do_call
 ; RV64I-SW-NEXT:    call foo
 ; RV64I-SW-NEXT:    li a0, 57
 ; RV64I-SW-NEXT:    beq s0, a0, .LBB3_3
 ; RV64I-SW-NEXT:  # %bb.2: # %do_call2
 ; RV64I-SW-NEXT:    call foo
-; RV64I-SW-NEXT:  .LBB3_3: # %exit
+; RV64I-SW-NEXT:  .LBB3_3:
 ; RV64I-SW-NEXT:    ld ra, 8(sp) # 8-byte Folded Reload
 ; RV64I-SW-NEXT:    ld s0, 0(sp) # 8-byte Folded Reload
 ; RV64I-SW-NEXT:    .cfi_restore ra
 ; RV64I-SW-NEXT:    .cfi_restore s0
 ; RV64I-SW-NEXT:    addi sp, sp, 16
 ; RV64I-SW-NEXT:    .cfi_def_cfa_offset 0
+; RV64I-SW-NEXT:  .LBB3_4: # %exit
 ; RV64I-SW-NEXT:    ret
   %cmp0 = icmp eq i32 %a, 57
   br i1 %cmp0, label %exit, label %do_call

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Cond[2]?

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add a test where the branch that uses x10 (physical register) wants to move the copy after some inline assembly that clobbers x10.

I think the input MIR would look something like:

...
  %0:gpr = COPY $x10
  <inline asm that clobbers $x10>
  %3:gpr = COPY %0:gpr
  BEQ %2:gpr, %3:gpr, %bb3
  PseudoBR $bb2
...

I am not sure you can propagate the copy in this case, but maybe you can?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the FIXME on this function be removed now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes branch uses special?

@preames
Copy link
Collaborator

preames commented May 21, 2025

Can you say a bit more about the motivation for this patch? Was this noticed from workload analysis? If so, which workload? Have you done any broader performance comparison to justify this?

The heuristic in this patch seems somewhat unprincipled (i.e. why only copies feeding branches?). That doesn't mean we can't accept it, but it does mean we need to know why this helps in practice.

@topperc
Copy link
Collaborator

topperc commented May 21, 2025

I did some more digging of where these copies come from. At least for the test case from shrinkwrap.ll, the copy is created by a CopyToReg that is inserted after the AssertZExt from argument lowering. This copy is not created for this test case on RV32 because the AssertZExt doesn't get created and there's a peephole that avoids creating a CopyToReg if the the source is a CopyFromReg here https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L11882

In order to for SelectionDAG's cross basic block known bits propagation to work we do need this CopyToReg to propagate the AssertZExt information. After the last DAGCombine we call computeKnownBits on all CopyToReg node inputs and save the KnownBits in a map entry for the destination register. This map is queried by SelectionDAGBuilder for the CopyFromReg in the receiving block.

I wonder if we can avoid creating the CopyToReg by peeking through the AssertZExt around here https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L11882, but update the KnownBits cache.

@topperc
Copy link
Collaborator

topperc commented May 21, 2025

I did some more digging of where these copies come from. At least for the test case from shrinkwrap.ll, the copy is created by a CopyToReg that is inserted after the AssertZExt from argument lowering. This copy is not created for this test case on RV32 because the AssertZExt doesn't get created and there's a peephole that avoids creating a CopyToReg if the the source is a CopyFromReg here https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L11882

In order to for SelectionDAG's cross basic block known bits propagation to work we do need this CopyToReg to propagate the AssertZExt information. After the last DAGCombine we call computeKnownBits on all CopyToReg node inputs and save the KnownBits in a map entry for the destination register. This map is queried by SelectionDAGBuilder for the CopyFromReg in the receiving block.

I wonder if we can avoid creating the CopyToReg by peeking through the AssertZExt around here https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L11882, but update the KnownBits cache.

I think I've completely misunderstood. This pass is exploiting the existence of the extra copy and removing it in a different way than the register coalescer does. The shrink-wrap problem still exists for this case in RV32 and would exist for an i64 argument on RV64.

@ChunyuLiao
Copy link
Member Author

ChunyuLiao commented May 22, 2025

This work was originally motivated by spec cpu 473, which revealed that LLVM's shrink wrapping implementation performs poorly in certain cases.

t4: i64,ch = CopyFromReg t0, Register:i64 %1
t8: i64 = AssertZext t4, ValueType:ch:i32
t11: ch = CopyToReg t0, Register:i64 %2, t8

This pass only works because there is a type promotion involved; without the type promotion, it wouldn't be effective.

The reason for adding this pass is that it operates on the full register, which makes it safer.

@topperc
Copy link
Collaborator

topperc commented May 22, 2025

This work was originally motivated by spec cpu 473, which revealed that LLVM's shrink wrapping implementation performs poorly in certain cases.

t4: i64,ch = CopyFromReg t0, Register:i64 %1
t8: i64 = AssertZext t4, ValueType:ch:i32
t11: ch = CopyToReg t0, Register:i64 %2, t8

This pass only works because there is a type promotion involved; without the type promotion, it wouldn't be effective.

The reason for adding this pass is that it operates on the full register, which makes it safer.

How much performance does this give for 473? As Philip said, this pass seems somewhat unprincipled and relies on SelectionDAG creating extra copies due to a quirk of its implementation. If that quirk changes (or if we switch to GISel) this pass will stop working.

… shrink-wrap optimization

Some data types that require extension have redundant copy instructions.
This pass removes specific copies to help shrink-wrap optimization.
@ChunyuLiao
Copy link
Member Author

The patch was updated to incorporate feedback from earlier reviews. However, based on the discussion above and the fact that the optimization showed little benefit for spec cpu 473, close the patch. Many thanks to everyone for the support and feedback.

@ChunyuLiao ChunyuLiao closed this May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants