Skip to content

AMDGPU: Handle multiple AGPR MFMA rewrites #147975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 11, 2025

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Jul 10, 2025

I have this firing on one of the real examples, need to
produce the tests and check a few edge cases

Copy link
Contributor Author

arsenm commented Jul 10, 2025

@arsenm arsenm requested review from jrbyrnes and srpande July 10, 2025 14:39
@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

I have this firing on one of the real examples, need to
produce the tests and check a few edge cases


Full diff: https://github.com/llvm/llvm-project/pull/147975.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp (+31-11)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index a8e1967116d19..7a7ea283590da 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -57,27 +57,33 @@ class AMDGPURewriteAGPRCopyMFMAImpl {
         TRI(*ST.getRegisterInfo()), MRI(MF.getRegInfo()), VRM(VRM), LRM(LRM),
         LIS(LIS) {}
 
+  bool isRewriteCandidate(const MachineInstr &MI) const {
+    if (!TII.isMAI(MI))
+      return false;
+    return AMDGPU::getMFMASrcCVDstAGPROp(MI.getOpcode()) != -1;
+  }
+
   /// Compute the register class constraints based on the uses of \p Reg,
   /// excluding uses from \p ExceptMI. This should be nearly identical to
   /// MachineRegisterInfo::recomputeRegClass.
   const TargetRegisterClass *
-  recomputeRegClassExcept(Register Reg, const TargetRegisterClass *OldRC,
-                          const TargetRegisterClass *NewRC,
-                          const MachineInstr *ExceptMI) const;
+  recomputeRegClassExceptRewritable(Register Reg,
+                                    const TargetRegisterClass *OldRC,
+                                    const TargetRegisterClass *NewRC) const;
 
   bool run(MachineFunction &MF) const;
 };
 
 const TargetRegisterClass *
-AMDGPURewriteAGPRCopyMFMAImpl::recomputeRegClassExcept(
+AMDGPURewriteAGPRCopyMFMAImpl::recomputeRegClassExceptRewritable(
     Register Reg, const TargetRegisterClass *OldRC,
-    const TargetRegisterClass *NewRC, const MachineInstr *ExceptMI) const {
+    const TargetRegisterClass *NewRC) const {
 
   // Accumulate constraints from all uses.
   for (MachineOperand &MO : MRI.reg_nodbg_operands(Reg)) {
     // Apply the effect of the given operand to NewRC.
     MachineInstr *MI = MO.getParent();
-    if (MI == ExceptMI)
+    if (isRewriteCandidate(*MI))
       continue;
 
     unsigned OpNo = &MO - &MI->getOperand(0);
@@ -182,10 +188,13 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::run(MachineFunction &MF) const {
       // first place, as well as need to assign another register, and need to
       // figure out where to put them. The live range splitting is smarter than
       // anything we're doing here, so trust it did something reasonable.
-      const TargetRegisterClass *Src2ExceptRC = recomputeRegClassExcept(
-          Src2->getReg(), Src2VirtRegRC, VirtRegRC, CopySrcMI);
-      if (!Src2ExceptRC)
+      const TargetRegisterClass *Src2ExceptRC =
+          recomputeRegClassExceptRewritable(Src2->getReg(), Src2VirtRegRC,
+                                            VirtRegRC);
+      if (!Src2ExceptRC) {
+        LLVM_DEBUG(dbgs() << "Could not recompute the regclass\n");
         continue;
+      }
 
       const TargetRegisterClass *NewSrc2ConstraintRC =
           TII.getRegClass(TII.get(AGPROp), Src2->getOperandNo(), &TRI, MF);
@@ -207,8 +216,19 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::run(MachineFunction &MF) const {
 
       CopySrcMI->setDesc(TII.get(AGPROp));
 
-      // TODO: Is replacing too aggressive, fixup these instructions only?
-      MRI.replaceRegWith(CopySrcReg, VReg);
+      // Perform replacement of the register, rewriting the rewritable uses.
+      for (MachineInstr &UseMI :
+           make_early_inc_range(MRI.reg_instructions(CopySrcReg))) {
+        if (TII.isMAI(UseMI)) {
+          // Note the register we need to rewrite may still appear in src0/src1,
+          // but that's fine since those can use A or V anyway.
+          int ReplacementOp = AMDGPU::getMFMASrcCVDstAGPROp(UseMI.getOpcode());
+          if (ReplacementOp != -1)
+            UseMI.setDesc(TII.get(ReplacementOp));
+        }
+
+        UseMI.substituteRegister(CopySrcReg, VReg, AMDGPU::NoSubRegister, TRI);
+      }
 
       LLVM_DEBUG(dbgs() << "Replaced VGPR MFMA with AGPR: " << *CopySrcMI);
 

@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-multiple-agpr-mfma-rewrites branch from 69def3b to 7eac9e4 Compare July 11, 2025 06:21
Copy link
Contributor

@srpande srpande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if src2 is 0?

DefMI %631:av_512_align2 = COPY %632:vreg_512_align2

CopySrcMI %632:vreg_512_align2 = contract V_MFMA_F32_32X32X2F32_vgprcd_e64 %628.sub0:av_128_align2, %234.sub0:vreg_128_align2, **0**, 0, 0, 0, implicit $mode, implicit $exec

This will crash.

@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-multiple-agpr-mfma-rewrites branch from 7eac9e4 to 3ec2a31 Compare July 16, 2025 07:02
@arsenm
Copy link
Contributor Author

arsenm commented Jul 24, 2025

What if src2 is 0?

DefMI %631:av_512_align2 = COPY %632:vreg_512_align2

CopySrcMI %632:vreg_512_align2 = contract V_MFMA_F32_32X32X2F32_vgprcd_e64 %628.sub0:av_128_align2, %234.sub0:vreg_128_align2, **0**, 0, 0, 0, implicit $mode, implicit $exec

This will crash.

It's broken, but it was already broken before this patch

@arsenm
Copy link
Contributor Author

arsenm commented Jul 24, 2025

What if src2 is 0?

DefMI %631:av_512_align2 = COPY %632:vreg_512_align2

CopySrcMI %632:vreg_512_align2 = contract V_MFMA_F32_32X32X2F32_vgprcd_e64 %628.sub0:av_128_align2, %234.sub0:vreg_128_align2, **0**, 0, 0, 0, implicit $mode, implicit $exec

This will crash.

It's broken, but it was already broken before this patch

#150552

@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-multiple-agpr-mfma-rewrites branch from 3ec2a31 to 8e03847 Compare July 25, 2025 01:14
@arsenm arsenm marked this pull request as ready for review July 25, 2025 01:33
@arsenm arsenm changed the title WIP: AMDGPU: Handle multiple AGPR MFMA rewrites AMDGPU: Handle multiple AGPR MFMA rewrites Jul 25, 2025
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-multiple-agpr-mfma-rewrites branch from 8e03847 to 33aad42 Compare July 26, 2025 23:58
// TODO: Is replacing too aggressive, fixup these instructions only?
MRI.replaceRegWith(CopySrcReg, VReg);
// Perform replacement of the register, rewriting the rewritable uses.
for (MachineInstr &UseMI :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is UseMI here misleading as reg_instructions will also include defs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit confusing too. The comment above also only mentions uses as being rewritten whereas defs will also have their register substituted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's use as in "is referenced" not "is use". I'm not going to bother fixing this since this code just gets deleted up the stack anyway

arsenm added a commit that referenced this pull request Aug 7, 2025
Test other splitting situations that appear in greedy.
This includes ensuring we have a case that hits a local split
and instruction split (most of the tests hit the region split path).

Also test a few cases where the final result isn't fully used, resulting
in partial copy bundles instead of a simple full copy. Test physreg
and virtreg agpr interference with a reassignment candidate.

I'm accumulating too many failure cases, and MIR tests are very prone
to painful merge conflicts, so I've added a few more tests and extracted
new tests from #147975.

Closes #149026
arsenm added a commit that referenced this pull request Aug 7, 2025
Test other splitting situations that appear in greedy.
This includes ensuring we have a case that hits a local split
and instruction split (most of the tests hit the region split path).

Also test a few cases where the final result isn't fully used, resulting
in partial copy bundles instead of a simple full copy. Test physreg
and virtreg agpr interference with a reassignment candidate.

I'm accumulating too many failure cases, and MIR tests are very prone
to painful merge conflicts, so I've added a few more tests and extracted
new tests from #147975.

Closes #149026
arsenm added a commit that referenced this pull request Aug 7, 2025
Test other splitting situations that appear in greedy.
This includes ensuring we have a case that hits a local split
and instruction split (most of the tests hit the region split path).

Also test a few cases where the final result isn't fully used, resulting
in partial copy bundles instead of a simple full copy. Test physreg
and virtreg agpr interference with a reassignment candidate.

I'm accumulating too many failure cases, and MIR tests are very prone
to painful merge conflicts, so I've added a few more tests and extracted
new tests from #147975.

Closes #149026
arsenm added a commit that referenced this pull request Aug 7, 2025
Test other splitting situations that appear in greedy.
This includes ensuring we have a case that hits a local split
and instruction split (most of the tests hit the region split path).

Also test a few cases where the final result isn't fully used, resulting
in partial copy bundles instead of a simple full copy. Test physreg
and virtreg agpr interference with a reassignment candidate.

I'm accumulating too many failure cases, and MIR tests are very prone
to painful merge conflicts, so I've added a few more tests and extracted
new tests from #147975.

Closes #149026
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-multiple-agpr-mfma-rewrites branch from 33aad42 to f2ef76b Compare August 8, 2025 15:44
Instead of ignoring the same user we started looking at, ignore
uses of rewritable MFMA candidates.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-multiple-agpr-mfma-rewrites branch from f2ef76b to 4775415 Compare August 11, 2025 00:44
Copy link
Contributor

@lucas-rami lucas-rami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the nit on UseMI.

// TODO: Is replacing too aggressive, fixup these instructions only?
MRI.replaceRegWith(CopySrcReg, VReg);
// Perform replacement of the register, rewriting the rewritable uses.
for (MachineInstr &UseMI :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit confusing too. The comment above also only mentions uses as being rewritten whereas defs will also have their register substituted.

@arsenm arsenm merged commit 9a29353 into main Aug 11, 2025
9 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/handle-multiple-agpr-mfma-rewrites branch August 11, 2025 14:10
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 11, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/22424

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_breakpointLocations.py (1181 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint-assembly/TestDAP_breakpointAssembly.py (1182 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_logpoints.py (1183 of 2304)
PASS: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py (1184 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setExceptionBreakpoints.py (1185 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setFunctionBreakpoints.py (1186 of 2304)
PASS: lldb-api :: tools/lldb-dap/breakpoint/TestDAP_setBreakpoints.py (1187 of 2304)
PASS: lldb-api :: tools/lldb-dap/commands/TestDAP_commands.py (1188 of 2304)
PASS: lldb-api :: python_api/watchpoint/watchlocation/TestTargetWatchAddress.py (1189 of 2304)
UNRESOLVED: lldb-api :: functionalities/statusline/TestStatusline.py (1190 of 2304)
******************** TEST 'lldb-api :: functionalities/statusline/TestStatusline.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/statusline -p TestStatusline.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision 9a293530d989aa061c5ec304e0d660d020a19bc3)
  clang revision 9a293530d989aa061c5ec304e0d660d020a19bc3
  llvm revision 9a293530d989aa061c5ec304e0d660d020a19bc3
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test (TestStatusline.TestStatusline)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_deadlock (TestStatusline.TestStatusline)
lldb-server exiting...
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_modulelist_deadlock (TestStatusline.TestStatusline)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_no_color (TestStatusline.TestStatusline)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_no_target (TestStatusline.TestStatusline)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_resize (TestStatusline.TestStatusline)
======================================================================
ERROR: test_modulelist_deadlock (TestStatusline.TestStatusline)
   Regression test for a deadlock that occurs when the status line is enabled before connecting
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/statusline/TestStatusline.py", line 199, in test_modulelist_deadlock
    self.expect(
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 95, in expect
    self.expect_prompt()
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbpexpect.py", line 19, in expect_prompt
    self.child.expect_exact(self.PROMPT)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/spawnbase.py", line 432, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 181, in expect_loop
    return self.timeout(e)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants