[AMDGPU] Fold dst = v_add 0, src -> src #163298

jrbyrnes · 2025-10-14T00:20:04Z

When using unsized extern representation of LDS, we may end up producing seemingly unnecessary v_add 0s .

%ptr = getelementptr i8, ptr addrspace(3) @global_smem, i32 %variable

The global_smem global gets lowered into this GET_GROUPSTATICSIZE.

->

%0:sreg_32 = GET_GROUPSTATICSIZE
%2:vgpr_32 = V_ADD_U32_e64 killed %0:sreg_32, killed %1:vgpr_32, 0, implicit $exec

This gets lowered to S_MOV_B32 MFI->getLDSSize()

->

%0:sreg_32 = S_MOV_B32 0
%2:vgpr_32 = V_ADD_U32_e64 killed %0:sreg_32, killed %1:vgpr_32, 0, implicit $exec

So we end up with V_ADD 0s . Unfortunately, folding these out doesn't fit nicely into the ISel pipeline

Change-Id: Ice882043dc3171eedd08049bb05ef5a046dbf94a

Change-Id: I9b1162d93722f33eb5067502baf87590bf861e3c

llvmbot · 2025-10-14T00:20:37Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jeffrey Byrnes (jrbyrnes)

Changes

When using unsized extern representation of LDS, we may end up producing seemingly unnecessary v_add 0s .

%ptr = getelementptr i8, ptr addrspace(3) @global_smem, i32 %variable

The global_smem global gets lowered into this GET_GROUPSTATICSIZE.

->

%0:sreg_32 = GET_GROUPSTATICSIZE
%2:vgpr_32 = V_ADD_U32_e64 killed %0:sreg_32, killed %1:vgpr_32, 0, implicit $exec

This gets lowered to S_MOV_B32 MFI->getLDSSize()

->

%0:sreg_32 = S_MOV_B32 0
%2:vgpr_32 = V_ADD_U32_e64 killed %0:sreg_32, killed %1:vgpr_32, 0, implicit $exec

So we end up with V_ADD 0s . Unfortunately, folding these out doesn't fit nicely into the ISel pipeline

Full diff: https://github.com/llvm/llvm-project/pull/163298.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+42-19)
(added) llvm/test/CodeGen/AMDGPU/groupstaticsize-zero.ll (+20)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 51c56ecea2c96..382360150d42f 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -245,7 +245,7 @@ class SIFoldOperandsImpl {
   std::optional<int64_t> getImmOrMaterializedImm(MachineOperand &Op) const;
   bool tryConstantFoldOp(MachineInstr *MI) const;
   bool tryFoldCndMask(MachineInstr &MI) const;
-  bool tryFoldZeroHighBits(MachineInstr &MI) const;
+  bool tryFoldArithmetic(MachineInstr &MI) const;
   bool foldInstOperand(MachineInstr &MI, const FoldableDef &OpToFold) const;
 
   bool foldCopyToAGPRRegSequence(MachineInstr *CopyMI) const;
@@ -1730,26 +1730,49 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr &MI) const {
   return true;
 }
 
-bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const {
-  if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 &&
-      MI.getOpcode() != AMDGPU::V_AND_B32_e32)
-    return false;
+bool SIFoldOperandsImpl::tryFoldArithmetic(MachineInstr &MI) const {
+  unsigned Opc = MI.getOpcode();
 
-  std::optional<int64_t> Src0Imm = getImmOrMaterializedImm(MI.getOperand(1));
-  if (!Src0Imm || *Src0Imm != 0xffff || !MI.getOperand(2).isReg())
-    return false;
+  auto replaceAndFold = [this](MachineOperand &NewOp, MachineOperand &OldOp,
+                               MachineInstr &MI) -> bool {
+    if (!(NewOp.isReg() && OldOp.isReg()))
+      return false;
+    Register OldReg = OldOp.getReg();
+    MRI->replaceRegWith(NewOp.getReg(), OldReg);
+    if (!OldOp.isKill())
+      MRI->clearKillFlags(OldReg);
+    MI.eraseFromParent();
+    return true;
+  };
 
-  Register Src1 = MI.getOperand(2).getReg();
-  MachineInstr *SrcDef = MRI->getVRegDef(Src1);
-  if (!ST->zeroesHigh16BitsOfDest(SrcDef->getOpcode()))
-    return false;
+  switch (Opc) {
+    default:
+      return false;
+    case AMDGPU::V_AND_B32_e64:
+    case AMDGPU::V_AND_B32_e32: {
+      std::optional<int64_t> Src0Imm = getImmOrMaterializedImm(MI.getOperand(1));
+      if (!Src0Imm || *Src0Imm != 0xffff || !MI.getOperand(2).isReg())
+        return false;
 
-  Register Dst = MI.getOperand(0).getReg();
-  MRI->replaceRegWith(Dst, Src1);
-  if (!MI.getOperand(2).isKill())
-    MRI->clearKillFlags(Src1);
-  MI.eraseFromParent();
-  return true;
+      MachineOperand &Src1Op = MI.getOperand(2);
+      MachineInstr *SrcDef = MRI->getVRegDef(Src1Op.getReg());
+      if (!ST->zeroesHigh16BitsOfDest(SrcDef->getOpcode()))
+        return false;
+
+      return replaceAndFold(MI.getOperand(0), Src1Op, MI);
+    }
+    case AMDGPU::V_ADD_U32_e64:
+    case AMDGPU::V_ADD_U32_e32: {
+      std::optional<int64_t> Src0Imm =
+          getImmOrMaterializedImm(MI.getOperand(1));
+      if (!Src0Imm || *Src0Imm != 0 || !MI.getOperand(2).isReg())
+        return false;
+
+      return replaceAndFold(MI.getOperand(0), MI.getOperand(2), MI);
+    }
+  }
+
+  return false;
 }
 
 bool SIFoldOperandsImpl::foldInstOperand(MachineInstr &MI,
@@ -2790,7 +2813,7 @@ bool SIFoldOperandsImpl::run(MachineFunction &MF) {
     for (auto &MI : make_early_inc_range(*MBB)) {
       Changed |= tryFoldCndMask(MI);
 
-      if (tryFoldZeroHighBits(MI)) {
+      if (tryFoldArithmetic(MI)) {
         Changed = true;
         continue;
       }
diff --git a/llvm/test/CodeGen/AMDGPU/groupstaticsize-zero.ll b/llvm/test/CodeGen/AMDGPU/groupstaticsize-zero.ll
new file mode 100644
index 0000000000000..e52eb8aca9f84
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/groupstaticsize-zero.ll
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx950 < %s | FileCheck -check-prefixes=GCN %s
+
+@global_smem = external addrspace(3) global [0 x i8]
+
+define amdgpu_kernel void @addzero() {
+; GCN-LABEL: addzero:
+; GCN:       ; %bb.0: ; %.lr.ph
+; GCN-NEXT:    v_mov_b32_e32 v2, 0
+; GCN-NEXT:    v_and_b32_e32 v0, 1, v0
+; GCN-NEXT:    v_mov_b32_e32 v3, v2
+; GCN-NEXT:    ds_write_b64 v0, v[2:3]
+; GCN-NEXT:    s_endpgm
+.lr.ph:
+  %0 = tail call i32 @llvm.amdgcn.workitem.id.x()
+  %1 = and i32 %0, 1
+  %2 = getelementptr i8, ptr addrspace(3) @global_smem, i32 %1
+  store <4 x bfloat> zeroinitializer, ptr addrspace(3) %2, align 8
+  ret void
+}

github-actions · 2025-10-14T00:22:22Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Change-Id: If2b8c5e64be9291fb92f433542cd926be3193027

jayfoad

This might fit better in tryConstantFoldOp, which already does similar things for XOR with 0 and AND with -1 for example.

This reverts commit 69e60fb.

This reverts commit 3c1c5a4.

This reverts commit fde9ab2.

Change-Id: I9b14559b4b5dc9c4bb383ebd517edcdc094a2e6c

Change-Id: I8e7681f4920c9dee1bb4fb1b303c4c886c1969e3

jrbyrnes added 2 commits October 2, 2025 16:57

NFC: Refactor tryFoldZeroHiBits

fde9ab2

Change-Id: Ice882043dc3171eedd08049bb05ef5a046dbf94a

[AMDGPU] Fold dst = v_add 0, src -> src

3c1c5a4

Change-Id: I9b1162d93722f33eb5067502baf87590bf861e3c

jrbyrnes requested review from arsenm and nhaehnle October 14, 2025 00:20

llvmbot added the backend:AMDGPU label Oct 14, 2025

Formatting

69e60fb

Change-Id: If2b8c5e64be9291fb92f433542cd926be3193027

jayfoad reviewed Oct 14, 2025

View reviewed changes

jrbyrnes added 5 commits October 14, 2025 10:12

Revert "Formatting"

62866df

This reverts commit 69e60fb.

Revert "[AMDGPU] Fold dst = v_add 0, src -> src"

24501fe

This reverts commit 3c1c5a4.

Revert "NFC: Refactor tryFoldZeroHiBits"

7434565

This reverts commit fde9ab2.

Move to tryConstantFoldOp

ec17ae2

Change-Id: I9b14559b4b5dc9c4bb383ebd517edcdc094a2e6c

Return true for changed

1b17398

Change-Id: I8e7681f4920c9dee1bb4fb1b303c4c886c1969e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Fold dst = v_add 0, src -> src #163298

[AMDGPU] Fold dst = v_add 0, src -> src #163298

Uh oh!

jrbyrnes commented Oct 14, 2025

Uh oh!

llvmbot commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

jayfoad left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AMDGPU] Fold dst = v_add 0, src -> src #163298

Are you sure you want to change the base?

[AMDGPU] Fold dst = v_add 0, src -> src #163298

Uh oh!

Conversation

jrbyrnes commented Oct 14, 2025

Uh oh!

llvmbot commented Oct 14, 2025

Uh oh!

github-actions bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Oct 14, 2025 •

edited

Loading