-
Notifications
You must be signed in to change notification settings - Fork 14.7k
AMDGPU: Handle multiple AGPR MFMA rewrites #147975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Handle multiple AGPR MFMA rewrites #147975
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesI have this firing on one of the real examples, need to Full diff: https://github.com/llvm/llvm-project/pull/147975.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index a8e1967116d19..7a7ea283590da 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -57,27 +57,33 @@ class AMDGPURewriteAGPRCopyMFMAImpl {
TRI(*ST.getRegisterInfo()), MRI(MF.getRegInfo()), VRM(VRM), LRM(LRM),
LIS(LIS) {}
+ bool isRewriteCandidate(const MachineInstr &MI) const {
+ if (!TII.isMAI(MI))
+ return false;
+ return AMDGPU::getMFMASrcCVDstAGPROp(MI.getOpcode()) != -1;
+ }
+
/// Compute the register class constraints based on the uses of \p Reg,
/// excluding uses from \p ExceptMI. This should be nearly identical to
/// MachineRegisterInfo::recomputeRegClass.
const TargetRegisterClass *
- recomputeRegClassExcept(Register Reg, const TargetRegisterClass *OldRC,
- const TargetRegisterClass *NewRC,
- const MachineInstr *ExceptMI) const;
+ recomputeRegClassExceptRewritable(Register Reg,
+ const TargetRegisterClass *OldRC,
+ const TargetRegisterClass *NewRC) const;
bool run(MachineFunction &MF) const;
};
const TargetRegisterClass *
-AMDGPURewriteAGPRCopyMFMAImpl::recomputeRegClassExcept(
+AMDGPURewriteAGPRCopyMFMAImpl::recomputeRegClassExceptRewritable(
Register Reg, const TargetRegisterClass *OldRC,
- const TargetRegisterClass *NewRC, const MachineInstr *ExceptMI) const {
+ const TargetRegisterClass *NewRC) const {
// Accumulate constraints from all uses.
for (MachineOperand &MO : MRI.reg_nodbg_operands(Reg)) {
// Apply the effect of the given operand to NewRC.
MachineInstr *MI = MO.getParent();
- if (MI == ExceptMI)
+ if (isRewriteCandidate(*MI))
continue;
unsigned OpNo = &MO - &MI->getOperand(0);
@@ -182,10 +188,13 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::run(MachineFunction &MF) const {
// first place, as well as need to assign another register, and need to
// figure out where to put them. The live range splitting is smarter than
// anything we're doing here, so trust it did something reasonable.
- const TargetRegisterClass *Src2ExceptRC = recomputeRegClassExcept(
- Src2->getReg(), Src2VirtRegRC, VirtRegRC, CopySrcMI);
- if (!Src2ExceptRC)
+ const TargetRegisterClass *Src2ExceptRC =
+ recomputeRegClassExceptRewritable(Src2->getReg(), Src2VirtRegRC,
+ VirtRegRC);
+ if (!Src2ExceptRC) {
+ LLVM_DEBUG(dbgs() << "Could not recompute the regclass\n");
continue;
+ }
const TargetRegisterClass *NewSrc2ConstraintRC =
TII.getRegClass(TII.get(AGPROp), Src2->getOperandNo(), &TRI, MF);
@@ -207,8 +216,19 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::run(MachineFunction &MF) const {
CopySrcMI->setDesc(TII.get(AGPROp));
- // TODO: Is replacing too aggressive, fixup these instructions only?
- MRI.replaceRegWith(CopySrcReg, VReg);
+ // Perform replacement of the register, rewriting the rewritable uses.
+ for (MachineInstr &UseMI :
+ make_early_inc_range(MRI.reg_instructions(CopySrcReg))) {
+ if (TII.isMAI(UseMI)) {
+ // Note the register we need to rewrite may still appear in src0/src1,
+ // but that's fine since those can use A or V anyway.
+ int ReplacementOp = AMDGPU::getMFMASrcCVDstAGPROp(UseMI.getOpcode());
+ if (ReplacementOp != -1)
+ UseMI.setDesc(TII.get(ReplacementOp));
+ }
+
+ UseMI.substituteRegister(CopySrcReg, VReg, AMDGPU::NoSubRegister, TRI);
+ }
LLVM_DEBUG(dbgs() << "Replaced VGPR MFMA with AGPR: " << *CopySrcMI);
|
69def3b
to
7eac9e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if src2 is 0?
DefMI %631:av_512_align2 = COPY %632:vreg_512_align2
CopySrcMI %632:vreg_512_align2 = contract V_MFMA_F32_32X32X2F32_vgprcd_e64 %628.sub0:av_128_align2, %234.sub0:vreg_128_align2, **0**, 0, 0, 0, implicit $mode, implicit $exec
This will crash.
7eac9e4
to
3ec2a31
Compare
It's broken, but it was already broken before this patch |
|
3ec2a31
to
8e03847
Compare
8e03847
to
33aad42
Compare
// TODO: Is replacing too aggressive, fixup these instructions only? | ||
MRI.replaceRegWith(CopySrcReg, VReg); | ||
// Perform replacement of the register, rewriting the rewritable uses. | ||
for (MachineInstr &UseMI : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is UseMI
here misleading as reg_instructions
will also include defs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a bit confusing too. The comment above also only mentions uses as being rewritten whereas defs will also have their register substituted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's use as in "is referenced" not "is use". I'm not going to bother fixing this since this code just gets deleted up the stack anyway
Test other splitting situations that appear in greedy. This includes ensuring we have a case that hits a local split and instruction split (most of the tests hit the region split path). Also test a few cases where the final result isn't fully used, resulting in partial copy bundles instead of a simple full copy. Test physreg and virtreg agpr interference with a reassignment candidate. I'm accumulating too many failure cases, and MIR tests are very prone to painful merge conflicts, so I've added a few more tests and extracted new tests from #147975. Closes #149026
Test other splitting situations that appear in greedy. This includes ensuring we have a case that hits a local split and instruction split (most of the tests hit the region split path). Also test a few cases where the final result isn't fully used, resulting in partial copy bundles instead of a simple full copy. Test physreg and virtreg agpr interference with a reassignment candidate. I'm accumulating too many failure cases, and MIR tests are very prone to painful merge conflicts, so I've added a few more tests and extracted new tests from #147975. Closes #149026
Test other splitting situations that appear in greedy. This includes ensuring we have a case that hits a local split and instruction split (most of the tests hit the region split path). Also test a few cases where the final result isn't fully used, resulting in partial copy bundles instead of a simple full copy. Test physreg and virtreg agpr interference with a reassignment candidate. I'm accumulating too many failure cases, and MIR tests are very prone to painful merge conflicts, so I've added a few more tests and extracted new tests from #147975. Closes #149026
Test other splitting situations that appear in greedy. This includes ensuring we have a case that hits a local split and instruction split (most of the tests hit the region split path). Also test a few cases where the final result isn't fully used, resulting in partial copy bundles instead of a simple full copy. Test physreg and virtreg agpr interference with a reassignment candidate. I'm accumulating too many failure cases, and MIR tests are very prone to painful merge conflicts, so I've added a few more tests and extracted new tests from #147975. Closes #149026
33aad42
to
f2ef76b
Compare
Instead of ignoring the same user we started looking at, ignore uses of rewritable MFMA candidates.
f2ef76b
to
4775415
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo the nit on UseMI
.
// TODO: Is replacing too aggressive, fixup these instructions only? | ||
MRI.replaceRegWith(CopySrcReg, VReg); | ||
// Perform replacement of the register, rewriting the rewritable uses. | ||
for (MachineInstr &UseMI : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a bit confusing too. The comment above also only mentions uses as being rewritten whereas defs will also have their register substituted.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/22424 Here is the relevant piece of the build log for the reference
|
I have this firing on one of the real examples, need to
produce the tests and check a few edge cases