Skip to content

Conversation

@tomershafir
Copy link
Contributor

This patch pivots GPR32 and GPR64 zeroing into distinct branches to simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones. Zero source registers WZR and XZR do not require register annotations of undef, implicit and kill. The non-zeroing source now cannot process WZR removing the ternary expression. This patch also moves GPR64 logic right after GPR32 for better organization.

@github-actions
Copy link

github-actions bot commented Oct 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@tomershafir tomershafir force-pushed the aarch64-simplify-wzr-copy-expansion branch 3 times, most recently from 591fd5b to 9220738 Compare October 12, 2025 11:49
@tomershafir tomershafir marked this pull request as ready for review October 12, 2025 12:43
@llvmbot
Copy link
Member

llvmbot commented Oct 12, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Tomer Shafir (tomershafir)

Changes

This patch pivots GPR32 and GPR64 zeroing into distinct branches to simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones. Zero source registers WZR and XZR do not require register annotations of undef, implicit and kill. The non-zeroing source now cannot process WZR removing the ternary expression. This patch also moves GPR64 logic right after GPR32 for better organization.


Full diff: https://github.com/llvm/llvm-project/pull/163059.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+48-31)
  • (modified) llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir (+2-2)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 30dfcf2b2038a..ba48ee866d68e 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5063,7 +5063,7 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
                                    bool RenamableDest,
                                    bool RenamableSrc) const {
   if (AArch64::GPR32spRegClass.contains(DestReg) &&
-      (AArch64::GPR32spRegClass.contains(SrcReg) || SrcReg == AArch64::WZR)) {
+      AArch64::GPR32spRegClass.contains(SrcReg)) {
     if (DestReg == AArch64::WSP || SrcReg == AArch64::WSP) {
       // If either operand is WSP, expand to ADD #0.
       if (Subtarget.hasZeroCycleRegMoveGPR64() &&
@@ -5088,21 +5088,14 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
             .addImm(0)
             .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
       }
-    } else if (SrcReg == AArch64::WZR && Subtarget.hasZeroCycleZeroingGPR32()) {
-      BuildMI(MBB, I, DL, get(AArch64::MOVZWi), DestReg)
-          .addImm(0)
-          .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
     } else if (Subtarget.hasZeroCycleRegMoveGPR64() &&
                !Subtarget.hasZeroCycleRegMoveGPR32()) {
       // Cyclone recognizes "ORR Xd, XZR, Xm" as a zero-cycle register move.
       MCRegister DestRegX = RI.getMatchingSuperReg(DestReg, AArch64::sub_32,
                                                    &AArch64::GPR64spRegClass);
       assert(DestRegX.isValid() && "Destination super-reg not valid");
-      MCRegister SrcRegX =
-          SrcReg == AArch64::WZR
-              ? AArch64::XZR
-              : RI.getMatchingSuperReg(SrcReg, AArch64::sub_32,
-                                       &AArch64::GPR64spRegClass);
+      MCRegister SrcRegX = RI.getMatchingSuperReg(SrcReg, AArch64::sub_32,
+                                                  &AArch64::GPR64spRegClass);
       assert(SrcRegX.isValid() && "Source super-reg not valid");
       // This instruction is reading and writing X registers.  This may upset
       // the register scavenger and machine verifier, so we need to indicate
@@ -5121,6 +5114,51 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
     return;
   }
 
+  // GPR32 zeroing
+  if (AArch64::GPR32spRegClass.contains(DestReg) && SrcReg == AArch64::WZR) {
+    if (Subtarget.hasZeroCycleZeroingGPR32()) {
+      BuildMI(MBB, I, DL, get(AArch64::MOVZWi), DestReg)
+          .addImm(0)
+          .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
+    } else {
+      BuildMI(MBB, I, DL, get(AArch64::ORRWrr), DestReg)
+          .addReg(AArch64::WZR)
+          .addReg(AArch64::WZR);
+    }
+    return;
+  }
+
+  if (AArch64::GPR64spRegClass.contains(DestReg) &&
+      AArch64::GPR64spRegClass.contains(SrcReg)) {
+    if (DestReg == AArch64::SP || SrcReg == AArch64::SP) {
+      // If either operand is SP, expand to ADD #0.
+      BuildMI(MBB, I, DL, get(AArch64::ADDXri), DestReg)
+          .addReg(SrcReg, getKillRegState(KillSrc))
+          .addImm(0)
+          .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
+    } else {
+      // Otherwise, expand to ORR XZR.
+      BuildMI(MBB, I, DL, get(AArch64::ORRXrr), DestReg)
+          .addReg(AArch64::XZR)
+          .addReg(SrcReg, getKillRegState(KillSrc));
+    }
+    return;
+  }
+
+  // GPR64 zeroing
+  if (AArch64::GPR64spRegClass.contains(DestReg) && SrcReg == AArch64::XZR) {
+    if (Subtarget.hasZeroCycleZeroingGPR64()) {
+      BuildMI(MBB, I, DL, get(AArch64::MOVZXi), DestReg)
+          .addImm(0)
+          .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
+    } else {
+      BuildMI(MBB, I, DL, get(AArch64::ORRXrr), DestReg)
+          .addReg(AArch64::XZR)
+          .addReg(AArch64::XZR);
+    }
+    return;
+  }
+
   // Copy a Predicate register by ORRing with itself.
   if (AArch64::PPRRegClass.contains(DestReg) &&
       AArch64::PPRRegClass.contains(SrcReg)) {
@@ -5205,27 +5243,6 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
     return;
   }
 
-  if (AArch64::GPR64spRegClass.contains(DestReg) &&
-      (AArch64::GPR64spRegClass.contains(SrcReg) || SrcReg == AArch64::XZR)) {
-    if (DestReg == AArch64::SP || SrcReg == AArch64::SP) {
-      // If either operand is SP, expand to ADD #0.
-      BuildMI(MBB, I, DL, get(AArch64::ADDXri), DestReg)
-          .addReg(SrcReg, getKillRegState(KillSrc))
-          .addImm(0)
-          .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
-    } else if (SrcReg == AArch64::XZR && Subtarget.hasZeroCycleZeroingGPR64()) {
-      BuildMI(MBB, I, DL, get(AArch64::MOVZXi), DestReg)
-          .addImm(0)
-          .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
-    } else {
-      // Otherwise, expand to ORR XZR.
-      BuildMI(MBB, I, DL, get(AArch64::ORRXrr), DestReg)
-          .addReg(AArch64::XZR)
-          .addReg(SrcReg, getKillRegState(KillSrc));
-    }
-    return;
-  }
-
   // Copy a DDDD register quad by copying the individual sub-registers.
   if (AArch64::DDDDRegClass.contains(DestReg) &&
       AArch64::DDDDRegClass.contains(SrcReg)) {
diff --git a/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir b/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir
index 284d624a4e68f..dc9eaad2c9834 100644
--- a/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir
+++ b/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir
@@ -39,7 +39,7 @@ body:             |
     ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-LABEL: name: f0
     ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
     ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: {{  $}}
-    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: $x0 = ORRXrr $xzr, undef $xzr, implicit $wzr
+    ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: $w0 = ORRWrr $wzr, $wzr
     ; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
     ;
     ; CHECK-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-LABEL: name: f0
@@ -103,7 +103,7 @@ body:             |
     ; CHECK-ZCM-ZCZ: liveins: $x0, $lr
     ; CHECK-ZCM-ZCZ-NEXT: {{  $}}
     ; CHECK-ZCM-ZCZ-NEXT: $x0 = MOVZXi 0, 0
-    ; CHECK-ZCM-ZCZ-NEXT:BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
+    ; CHECK-ZCM-ZCZ-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
     $x0 = COPY $xzr
     BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
 ...

@dtellenbach
Copy link
Member

This should be a NFC, right? Please mark as such by adding [NFC] to the title of your commit message.

@tomershafir
Copy link
Contributor Author

Its not a NFC. You can see the functional change in the test change in the patch

@MacDue
Copy link
Member

MacDue commented Oct 13, 2025

It's practically an NFC though, right? This won't change the final code.

@tomershafir
Copy link
Contributor Author

In theory, it can change final code from orr x0, xzr, xzr to orr w0, wzr, wzr.

@tomershafir
Copy link
Contributor Author

@MacDue @dtellenbach Ping

; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ: liveins: $x0, $lr
; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: {{ $}}
; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: $x0 = ORRXrr $xzr, undef $xzr, implicit $wzr
; CHECK-NO-ZCM-GPR32-ZCM-GPR64-NO-ZCZ-NEXT: $w0 = ORRWrr $wzr, $wzr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expand the testing a bit and check for different combinations of 32 and 64bit zcz? It's currently all or nothing because the test was added before splitting the feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add. These new test cases are not affected by the specific functional change, but it probably fits under this patch title.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now I can remove ZCM combinations because it only depends on ZCZ features.

Copy link
Member

@dtellenbach dtellenbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This patch pivots GPR32 and GPR64 zeroing into distinct branches to simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones. Zero source registers WZR and XZR do not require register annotations of undef, implicit and kill. The non-zeroing source now cannot process WZR removing the ternary expression. This patch also moves GPR64 logic right after GPR32 for better organization.
@tomershafir tomershafir force-pushed the aarch64-simplify-wzr-copy-expansion branch from 728d59e to d80b7dd Compare October 20, 2025 07:48
@tomershafir tomershafir force-pushed the aarch64-simplify-wzr-copy-expansion branch from d80b7dd to b65c92c Compare October 20, 2025 08:12
@tomershafir tomershafir merged commit 5ac616f into llvm:main Oct 20, 2025
16 of 18 checks passed
@tomershafir tomershafir deleted the aarch64-simplify-wzr-copy-expansion branch October 20, 2025 12:00
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
This patch pivots GPR32 and GPR64 zeroing into distinct branches to
simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones.
Zero source registers WZR and XZR do not require register annotations of
undef, implicit and kill. The non-zeroing source now cannot process WZR
removing the ternary expression. This patch also moves GPR64 logic right
after GPR32 for better organization.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
This patch pivots GPR32 and GPR64 zeroing into distinct branches to
simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones.
Zero source registers WZR and XZR do not require register annotations of
undef, implicit and kill. The non-zeroing source now cannot process WZR
removing the ternary expression. This patch also moves GPR64 logic right
after GPR32 for better organization.
tomershafir added a commit to swiftlang/llvm-project that referenced this pull request Nov 11, 2025
This patch pivots GPR32 and GPR64 zeroing into distinct branches to
simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones.
Zero source registers WZR and XZR do not require register annotations of
undef, implicit and kill. The non-zeroing source now cannot process WZR
removing the ternary expression. This patch also moves GPR64 logic right
after GPR32 for better organization.

(cherry-pick 5ac616f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants