Skip to content

Conversation

@tomershafir
Copy link
Contributor

@tomershafir tomershafir commented Jul 8, 2025

Adds a subtarget feature called FeatureZCRegMoveFPR128 that enables to query wether the target supports zero cycle reg move for FPR128 NEON registers, and embeds it into the appropriate processors. It prepares for a register coalescer optimization to prevent rematerialization of moves where the target supports ZCM.

Adds 2 subtarget hooks canLowerToZeroCycleRegMove and canLowerToZeroCycleRegZeroing to enable query if an instruction can be lowered to a zero cycle instruction. The logic depends on the microarchitecture. This patch also provide an implementation for AArch64 based on AArch64InstrInfo::copyPhysReg which supports both physical and virtual registers.

It prepares for a register coalescer optimization to prevent rematerialization of moves where the target supports ZCM.

Adds a target hook shouldReMaterializeTrivialRegDef that enables target to specify wether rematerialization of the copy is beneficial. This patch also provide an implementation for AArch64 based on the new subtarget hooks canLowerToZeroCycleReg[Move|Zeroing].

It prepares for a register coalescer optimization to prevent rematerialization of moves where the target supports ZCM.

This change makes the register coalescer prevent rematerialization of a trivial def for a move instruction, if the target guides against it, based on the new target hook shouldReMaterializeTrivialRegDef. The filter is appended to the exiting logic. The patch includes isolated MIR tests for all register classes supported.

It depends on the register allocator to rematerialze reloads and remove spills based on register pressure (as it can increase the live interval of the register being copied from).

@github-actions
Copy link

github-actions bot commented Jul 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Adds a subtarget feature called `FeatureZCRegMoveFPR128` that enables to query wether the target supports zero cycle reg move
for FPR128 NEON registers, and embeds it into the appropriate processors. It prepares for a register coalescer optimization to pre
vent rematerialization of moves where the target supports ZCM.
Adds 2 subtarget hooks `canLowerToZeroCycleRegMove` and `canLowerToZeroCycleRegZeroing` to enable query if an instruction
can be lowered to a zero cycle instruction. The logic depends on the microarchitecture. This patch also provide an implementation
for AArch64 based on `AArch64InstrInfo::copyPhysReg` which supports both physical and virtual registers.

It prepares for a register coalescer optimization to prevent rematerialization of moves where the target supports ZCM.
@tomershafir tomershafir force-pushed the codegen-reg-coalescer-check-zcm-on-remat branch 2 times, most recently from 13b1205 to 94dce97 Compare July 8, 2025 18:19
Adds a target hook `shouldReMaterializeTrivialRegDef` that enables target to specify wether rematerialization of the copy is b
eneficial. This patch also provide an implementation for AArch64 based on the new subtarget hooks `canLowerToZeroCycleReg[Move|Zer
oing]`.

It prepares for a register coalescer optimization to prevent rematerialization of moves where the target supports ZCM.
@tomershafir tomershafir force-pushed the codegen-reg-coalescer-check-zcm-on-remat branch 2 times, most recently from 3a6f100 to c1eaf8e Compare July 9, 2025 13:08
This change makes the register coalescer prevent rematerialization of a trivial def for a move instruction, if the target guides against it, based on the new target hook `shouldReMaterializeTrivialRegDef`. The filter is appended to the exiting logic. The patch includes isolated MIR tests for all register classes supported, and fixes existing tests.
@tomershafir tomershafir force-pushed the codegen-reg-coalescer-check-zcm-on-remat branch from c1eaf8e to a8ad1f9 Compare July 9, 2025 14:00
@tomershafir tomershafir marked this pull request as ready for review July 9, 2025 21:35
@tomershafir tomershafir requested a review from jroelofs July 9, 2025 21:35
@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2025

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-aarch64

Author: Tomer Shafir (tomershafir)

Changes

Patch is 33.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147571.diff

14 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+16)
  • (modified) llvm/include/llvm/CodeGen/TargetSubtargetInfo.h (+42)
  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+9)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+3)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+10)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.h (+4)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+10)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+81)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+13)
  • (modified) llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll (+12-19)
  • (added) llvm/test/CodeGen/AArch64/arm64-reg-coalesce-remat-zero-cycle-regmov-fpr.mir (+174)
  • (added) llvm/test/CodeGen/AArch64/arm64-reg-coalesce-remat-zero-cycle-regmov-gpr.mir (+90)
  • (modified) llvm/test/CodeGen/AArch64/arm64-vshuffle.ll (+9-3)
  • (modified) llvm/test/CodeGen/AArch64/clear-dead-implicit-def-impdef.ll (+26-27)
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index b5b83c7ff1164..4a5b2fb0dd613 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -418,6 +418,22 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
     return true;
   }
 
+  /// Returns true if CopyMI should be considered for register
+  /// definition rematerialization. Otherwise, returns false.
+  ///
+  /// Rematerialization can replace a source register with its value
+  /// from its definition. Its applied in the register coalescer,
+  /// after instruction selection and before register allocation.
+  ///
+  /// Subtargets can override this method to classify rematerialization
+  /// candidates. Note that this cannot be defined in tablegen because it
+  /// operates at a higher level.
+  virtual bool shouldReMaterializeTrivialRegDef(const MachineInstr *CopyMI,
+                                                const Register &DestReg,
+                                                const Register &SrcReg) const {
+    return true;
+  }
+
   /// Re-issue the specified 'original' instruction at the
   /// specific location targeting a new destination register.
   /// The register in Orig->getOperand(0).getReg() will be substituted by
diff --git a/llvm/include/llvm/CodeGen/TargetSubtargetInfo.h b/llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
index 45e67d80629cb..c5a7ed19d54dd 100644
--- a/llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
@@ -185,6 +185,48 @@ class LLVM_ABI TargetSubtargetInfo : public MCSubtargetInfo {
     return false;
   }
 
+  /// Returns true if CopyMI can be lowered to a zero cycle register move.
+  /// Otherwise, returns false.
+  ///
+  /// Lowering to zero cycle register moves depend on the microarchitecture
+  /// for the specific architectural registers and instructions supported.
+  /// Thus, currently its applied after register allocation,
+  /// when `ExpandPostRAPseudos` pass calls `TargetInstrInfo::lowerCopy`
+  /// which in turn calls `TargetInstrInfo::copyPhysReg`.
+  ///
+  /// Subtargets can override this method to classify lowering candidates.
+  /// Note that this cannot be defined in tablegen because it operates at
+  /// a higher level.
+  ///
+  /// NOTE: Subtargets must maintain consistency between the logic here and
+  /// on lowering.
+  virtual bool canLowerToZeroCycleRegMove(const MachineInstr *CopyMI,
+                                          const Register &DestReg,
+                                          const Register &SrcReg) const {
+    return false;
+  }
+
+  /// Returns true if CopyMI can be lowered to a zero cycle register zeroing.
+  /// Otherwise, returns false.
+  ///
+  /// Lowering to zero cycle register zeroing depends on the microarchitecture
+  /// for the specific architectural registers and instructions supported.
+  /// Thus, currently it takes place after register allocation,
+  /// when `ExpandPostRAPseudos` pass calls `TargetInstrInfo::lowerCopy`
+  /// which in turn calls `TargetInstrInfo::copyPhysReg`.
+  ///
+  /// Subtargets can override this method to classify lowering candidates.
+  /// Note that this cannot be defined in tablegen because it operates at
+  /// a higher level.
+  ///
+  /// NOTE: Subtargets must maintain consistency between the logic here and
+  /// on lowering.
+  virtual bool canLowerToZeroCycleRegZeroing(const MachineInstr *CopyMI,
+                                             const Register &DestReg,
+                                             const Register &SrcReg) const {
+    return false;
+  }
+
   /// True if the subtarget should run MachineScheduler after aggressive
   /// coalescing.
   ///
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 2d25f097348af..088518914a2b1 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -69,6 +69,9 @@ STATISTIC(numCrossRCs, "Number of cross class joins performed");
 STATISTIC(numCommutes, "Number of instruction commuting performed");
 STATISTIC(numExtends, "Number of copies extended");
 STATISTIC(NumReMats, "Number of instructions re-materialized");
+STATISTIC(NumReMatsPrevented,
+          "Number of instruction rematerialization prevented by "
+          "`shouldReMaterializeTrivialRegDef` hook");
 STATISTIC(NumInflated, "Number of register classes inflated");
 STATISTIC(NumLaneConflicts, "Number of dead lane conflicts tested");
 STATISTIC(NumLaneResolves, "Number of dead lane conflicts resolved");
@@ -1400,6 +1403,12 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
   if (!Edit.canRematerializeAt(RM, ValNo, CopyIdx))
     return false;
 
+  if (!TII->shouldReMaterializeTrivialRegDef(CopyMI, DstReg, SrcReg)) {
+    LLVM_DEBUG(dbgs() << "Remat prevented: " << CopyIdx << "\t" << *CopyMI);
+    ++NumReMatsPrevented;
+    return false;
+  }
+
   DebugLoc DL = CopyMI->getDebugLoc();
   MachineBasicBlock *MBB = CopyMI->getParent();
   MachineBasicBlock::iterator MII =
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 05562516e5198..2fd449ef3f5f8 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -624,6 +624,9 @@ def FeatureZCRegMoveFPR64 : SubtargetFeature<"zcm-fpr64", "HasZeroCycleRegMoveFP
 def FeatureZCRegMoveFPR32 : SubtargetFeature<"zcm-fpr32", "HasZeroCycleRegMoveFPR32", "true",
                                         "Has zero-cycle register moves for FPR32 registers">;
 
+def FeatureZCRegMoveFPR128 : SubtargetFeature<"zcm-fpr128", "HasZeroCycleRegMoveFPR128", "true",
+                                        "Has zero-cycle register moves for FPR128 registers">;
+                                        
 def FeatureZCZeroingGP : SubtargetFeature<"zcz-gp", "HasZeroCycleZeroingGP", "true",
                                         "Has zero-cycle zeroing instructions for generic registers">;
 
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 8847c62690714..ee178a1d64234 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -1029,6 +1029,13 @@ bool AArch64InstrInfo::isAsCheapAsAMove(const MachineInstr &MI) const {
   }
 }
 
+bool AArch64InstrInfo::shouldReMaterializeTrivialRegDef(
+    const MachineInstr *CopyMI, const Register &DestReg,
+    const Register &SrcReg) const {
+  return !Subtarget.canLowerToZeroCycleRegMove(CopyMI, DestReg, SrcReg) &&
+         !Subtarget.canLowerToZeroCycleRegZeroing(CopyMI, DestReg, SrcReg);
+}
+
 bool AArch64InstrInfo::isFalkorShiftExtFast(const MachineInstr &MI) {
   switch (MI.getOpcode()) {
   default:
@@ -5025,6 +5032,9 @@ void AArch64InstrInfo::copyGPRRegTuple(MachineBasicBlock &MBB,
   }
 }
 
+/// NOTE: must maintain consistency with
+/// `AArch64Subtarget::canLowerToZeroCycleRegMove` and
+/// `AArch64Subtarget::canLowerToZeroCycleRegZeroing`.
 void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
                                    MachineBasicBlock::iterator I,
                                    const DebugLoc &DL, Register DestReg,
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.h b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
index 7c255da333e4b..e858e93cab2a4 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.h
@@ -189,6 +189,10 @@ class AArch64InstrInfo final : public AArch64GenInstrInfo {
 
   bool isAsCheapAsAMove(const MachineInstr &MI) const override;
 
+  bool shouldReMaterializeTrivialRegDef(const MachineInstr *CopyMI,
+                                        const Register &DestReg,
+                                        const Register &SrcReg) const override;
+
   bool isCoalescableExtInstr(const MachineInstr &MI, Register &SrcReg,
                              Register &DstReg, unsigned &SubIdx) const override;
 
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index dcccde4a4d666..e6cc78d213e8e 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -313,6 +313,7 @@ def TuneAppleA7  : SubtargetFeature<"apple-a7", "ARMProcFamily", "AppleA7",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing,
                                     FeatureZCZeroingFPWorkaround]>;
 
@@ -327,6 +328,7 @@ def TuneAppleA10 : SubtargetFeature<"apple-a10", "ARMProcFamily", "AppleA10",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA11 : SubtargetFeature<"apple-a11", "ARMProcFamily", "AppleA11",
@@ -340,6 +342,7 @@ def TuneAppleA11 : SubtargetFeature<"apple-a11", "ARMProcFamily", "AppleA11",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA12 : SubtargetFeature<"apple-a12", "ARMProcFamily", "AppleA12",
@@ -353,6 +356,7 @@ def TuneAppleA12 : SubtargetFeature<"apple-a12", "ARMProcFamily", "AppleA12",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA13 : SubtargetFeature<"apple-a13", "ARMProcFamily", "AppleA13",
@@ -366,6 +370,7 @@ def TuneAppleA13 : SubtargetFeature<"apple-a13", "ARMProcFamily", "AppleA13",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA14 : SubtargetFeature<"apple-a14", "ARMProcFamily", "AppleA14",
@@ -384,6 +389,7 @@ def TuneAppleA14 : SubtargetFeature<"apple-a14", "ARMProcFamily", "AppleA14",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA15 : SubtargetFeature<"apple-a15", "ARMProcFamily", "AppleA15",
@@ -402,6 +408,7 @@ def TuneAppleA15 : SubtargetFeature<"apple-a15", "ARMProcFamily", "AppleA15",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA16 : SubtargetFeature<"apple-a16", "ARMProcFamily", "AppleA16",
@@ -420,6 +427,7 @@ def TuneAppleA16 : SubtargetFeature<"apple-a16", "ARMProcFamily", "AppleA16",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing]>;
 
 def TuneAppleA17 : SubtargetFeature<"apple-a17", "ARMProcFamily", "AppleA17",
@@ -438,6 +446,7 @@ def TuneAppleA17 : SubtargetFeature<"apple-a17", "ARMProcFamily", "AppleA17",
                                     FeatureStorePairSuppress,
                                     FeatureZCRegMoveGPR64,
                                     FeatureZCRegMoveFPR64,
+                                    FeatureZCRegMoveFPR128,
                                     FeatureZCZeroing]>;
 
 def TuneAppleM4 : SubtargetFeature<"apple-m4", "ARMProcFamily", "AppleM4",
@@ -455,6 +464,7 @@ def TuneAppleM4 : SubtargetFeature<"apple-m4", "ARMProcFamily", "AppleM4",
                                      FeatureFuseLiterals,
                                      FeatureZCRegMoveGPR64,
                                      FeatureZCRegMoveFPR64,
+                                     FeatureZCRegMoveFPR128,
                                      FeatureZCZeroing
                                      ]>;
 
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 68ed10570a52f..015c80371e52a 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -667,3 +667,84 @@ AArch64Subtarget::getPtrAuthBlockAddressDiscriminatorIfEnabled(
 bool AArch64Subtarget::enableMachinePipeliner() const {
   return getSchedModel().hasInstrSchedModel();
 }
+
+bool AArch64Subtarget::isRegInClass(const MachineInstr *MI, const Register &Reg,
+                                    const TargetRegisterClass *TRC) const {
+  if (Reg.isPhysical()) {
+    return TRC->contains(Reg);
+  } else {
+    const MachineRegisterInfo &MRI = MI->getMF()->getRegInfo();
+    return TRC->hasSubClassEq(MRI.getRegClass(Reg));
+  }
+}
+
+/// NOTE: must maintain consistency with `AArch64InstrInfo::copyPhysReg`.
+bool AArch64Subtarget::canLowerToZeroCycleRegMove(
+    const MachineInstr *CopyMI, const Register &DestReg,
+    const Register &SrcReg) const {
+  if (isRegInClass(CopyMI, DestReg, &AArch64::GPR32allRegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::GPR32allRegClass) &&
+      DestReg != AArch64::WZR) {
+    if (DestReg == AArch64::WSP || SrcReg == AArch64::WSP ||
+        SrcReg != AArch64::WZR || !hasZeroCycleZeroingGP()) {
+      return hasZeroCycleRegMoveGPR64() || hasZeroCycleRegMoveGPR32();
+    }
+    return false;
+  }
+
+  if (isRegInClass(CopyMI, DestReg, &AArch64::GPR64allRegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::GPR64allRegClass) &&
+      DestReg != AArch64::XZR) {
+    if (DestReg == AArch64::SP || SrcReg == AArch64::SP ||
+        SrcReg != AArch64::XZR || !hasZeroCycleZeroingGP()) {
+      return hasZeroCycleRegMoveGPR64();
+    }
+    return false;
+  }
+
+  if (isRegInClass(CopyMI, DestReg, &AArch64::FPR128RegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::FPR128RegClass)) {
+    return isNeonAvailable() && hasZeroCycleRegMoveFPR128();
+  }
+
+  if (isRegInClass(CopyMI, DestReg, &AArch64::FPR64RegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::FPR64RegClass)) {
+    return hasZeroCycleRegMoveFPR64();
+  }
+
+  if (isRegInClass(CopyMI, DestReg, &AArch64::FPR32RegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::FPR32RegClass)) {
+    return hasZeroCycleRegMoveFPR32() || hasZeroCycleRegMoveFPR64();
+  }
+
+  if (isRegInClass(CopyMI, DestReg, &AArch64::FPR16RegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::FPR16RegClass)) {
+    return hasZeroCycleRegMoveFPR32() || hasZeroCycleRegMoveFPR64();
+  }
+
+  if (isRegInClass(CopyMI, DestReg, &AArch64::FPR8RegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::FPR8RegClass)) {
+    return hasZeroCycleRegMoveFPR32() || hasZeroCycleRegMoveFPR64();
+  }
+
+  return false;
+}
+
+/// NOTE: must maintain consistency with `AArch64InstrInfo::copyPhysReg`.
+bool AArch64Subtarget::canLowerToZeroCycleRegZeroing(
+    const MachineInstr *CopyMI, const Register &DestReg,
+    const Register &SrcReg) const {
+  if (isRegInClass(CopyMI, DestReg, &AArch64::GPR32allRegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::GPR32allRegClass) &&
+      DestReg != AArch64::WZR) {
+    return AArch64::WZR == SrcReg && hasZeroCycleZeroingGP();
+  }
+
+  if (isRegInClass(CopyMI, DestReg, &AArch64::GPR64allRegClass) &&
+      isRegInClass(CopyMI, SrcReg, &AArch64::GPR64allRegClass) &&
+      DestReg != AArch64::XZR) {
+    return AArch64::XZR == SrcReg && hasZeroCycleZeroingGP();
+  }
+
+  return false;
+}
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index f95b0fafc607f..e08bd3e496479 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -120,6 +120,12 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   /// Initialize properties based on the selected processor family.
   void initializeProperties(bool HasMinSize);
 
+  /// Returns true if Reg is virtual and is assigned to,
+  /// or is physcial and is a member of, the TRC register class.
+  /// Otherwise, returns false.
+  bool isRegInClass(const MachineInstr *MI, const Register &Reg,
+                    const TargetRegisterClass *TRC) const;
+
 public:
   /// This constructor initializes the data members to match that
   /// of the specified triple.
@@ -163,6 +169,13 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   bool enableMachinePipeliner() const override;
   bool useDFAforSMS() const override { return false; }
 
+  bool canLowerToZeroCycleRegMove(const MachineInstr *CopyMI,
+                                  const Register &DestReg,
+                                  const Register &SrcReg) const override;
+  bool canLowerToZeroCycleRegZeroing(const MachineInstr *CopyMI,
+                                     const Register &DestReg,
+                                     const Register &SrcReg) const override;
+
   /// Returns ARM processor family.
   /// Avoid this function! CPU specifics should be kept local to this class
   /// and preferably modeled with SubtargetFeatures or properties in
diff --git a/llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll b/llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll
index 1b22514a59d60..890367a761281 100644
--- a/llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-abi-varargs.ll
@@ -64,18 +64,18 @@ define i32 @main() nounwind ssp {
 ; CHECK:       ; %bb.0:
 ; CHECK-NEXT:    sub sp, sp, #96
 ; CHECK-NEXT:    stp x29, x30, [sp, #80] ; 16-byte Folded Spill
-; CHECK-NEXT:    mov w9, #1 ; =0x1
-; CHECK-NEXT:    mov w8, #2 ; =0x2
-; CHECK-NEXT:    stp w8, w9, [sp, #72]
-; CHECK-NEXT:    mov w9, #3 ; =0x3
-; CHECK-NEXT:    mov w8, #4 ; =0x4
-; CHECK-NEXT:    stp w8, w9, [sp, #64]
-; CHECK-NEXT:    mov w9, #5 ; =0x5
-; CHECK-NEXT:    mov w8, #6 ; =0x6
-; CHECK-NEXT:    stp w8, w9, [sp, #56]
-; CHECK-NEXT:    mov w9, #7 ; =0x7
-; CHECK-NEXT:    mov w8, #8 ; =0x8
-; CHECK-NEXT:    stp w8, w9, [sp, #48]
+; CHECK-NEXT:    mov w8, #1 ; =0x1
+; CHECK-NEXT:    mov w1, #2 ; =0x2
+; CHECK-NEXT:    stp w1, w8, [sp, #72]
+; CHECK-NEXT:    mov w2, #3 ; =0x3
+; CHECK-NEXT:    mov w3, #4 ; =0x4
+; CHECK-NEXT:    stp w3, w2, [sp, #64]
+; CHECK-NEXT:    mov w4, #5 ; =0x5
+; CHECK-NEXT:    mov w5, #6 ; =0x6
+; CHECK-NEXT:    stp w5, w4, [sp, #56]
+; CHECK-NEXT:    mov w6, #7 ; =0x7
+; CHECK-NEXT:    mov w7, #8 ; =0x8
+; CHECK-NEXT:    stp w7, w6, [sp, #48]
 ; CHECK-NEXT:    mov w8, #9 ; =0x9
 ; CHECK-NEXT:    mov w9, #10 ; =0xa
 ; CHECK-NEXT:    stp w9, w8, [sp, #40]
@@ -86,13 +86,6 @@ define i32 @main() nounwind ssp {
 ; CHECK-NEXT:    str x9, [sp, #8]
 ; CHECK-NEXT:    str w8, [sp]
 ; CHECK-NEXT:    add x0, sp, #76
-; CHECK-NEXT:    mov w1, #2 ; =0x2
-; CHECK-NEXT:    mov w2, #3 ; =0x3
-; CHECK-NEXT:    mov w3, #4 ; =0x4
-; CHECK-NEXT:    mov w4, #5 ; =0x5
-; CHECK-NEXT:    mov w5, #6 ; =0x6
-; CHECK-NEXT:    mov w6, #7 ; =0x7
-; CHECK-NEXT:    mov w7, #8 ; =0x8
 ; CHECK-NEXT:    bl _fn9
 ; CHECK-NEXT:    mov w0, #0 ; =0x0
 ; CHECK-NEXT:    ldp x29, x30, [sp, #80] ; 16-byte Folded Reload
diff --git a/llvm/test/CodeGen/AArch64/arm64-reg-coalesce-remat-zero-cycle-regmov-fpr.mir b/llvm/test/CodeGen/AArch64/arm64-reg-coalesce-remat-zero-cycle-regmov-fpr.mir
new file mode 100644
index 0000000000000..303e25edb2b18
--- /dev/null
+++ b/llvm/test/CodeGen/AA...
[truncated]

@tomershafir tomershafir requested a review from davemgreen July 9, 2025 21:35
@tomershafir tomershafir deleted the codegen-reg-coalescer-check-zcm-on-remat branch July 10, 2025 16:23
@tomershafir
Copy link
Contributor Author

Reorganizing 1 commit per PR

@tomershafir tomershafir restored the codegen-reg-coalescer-check-zcm-on-remat branch July 10, 2025 16:41
@tomershafir tomershafir deleted the codegen-reg-coalescer-check-zcm-on-remat branch July 10, 2025 16:41
@tomershafir tomershafir restored the codegen-reg-coalescer-check-zcm-on-remat branch July 10, 2025 16:42
@tomershafir tomershafir deleted the codegen-reg-coalescer-check-zcm-on-remat branch July 10, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants