-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[CodeGen] Register-coalescer remat fix subreg liveness #165662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a bugfix in rematerialization where the liveness of subreg mask was incorrectly updated causing crash in scheduler.
|
@llvm/pr-subscribers-backend-systemz @llvm/pr-subscribers-backend-amdgpu Author: Vigneshwar Jayakumar (VigneshwarJ) ChangesThis is a bugfix in rematerialization where the liveness of subreg mask was incorrectly updated causing crash in scheduler. Full diff: https://github.com/llvm/llvm-project/pull/165662.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index e17a214b9a27d..acd189e4b1a8d 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1625,6 +1625,7 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
// dead def so that the interferences are properly modeled.
if (!SR.liveAt(DefIndex))
SR.createDeadDef(DefIndex, Alloc);
+ SR.LaneMask = DstMask & SR.LaneMask;
}
}
if (UpdatedSubRanges)
diff --git a/llvm/test/CodeGen/AMDGPU/reg-coalescer-subreg-liveness.mir b/llvm/test/CodeGen/AMDGPU/reg-coalescer-subreg-liveness.mir
new file mode 100644
index 0000000000000..3afc1d343a728
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/reg-coalescer-subreg-liveness.mir
@@ -0,0 +1,55 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx950 -run-pass=register-coalescer -verify-coalescing -o - %s | FileCheck %s
+
+# This test is to check fix for failure with "Bad machine code: Defining instruction does not modify register" due to corrupt lane mask.
+
+---
+name: reg_coalescer_subreg_liveness
+tracksRegLiveness: true
+liveins:
+body: |
+ ; CHECK-LABEL: name: reg_coalescer_subreg_liveness
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x80000000)
+ ; CHECK-NEXT: liveins: $sgpr4_sgpr5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr4_sgpr5
+ ; CHECK-NEXT: undef [[S_LOAD_DWORD_IMM:%[0-9]+]].sub2:sgpr_128 = S_LOAD_DWORD_IMM [[COPY]](p4), 0, 0 :: (dereferenceable invariant load (s32), align 16, addrspace 4)
+ ; CHECK-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]].sub1:sgpr_128 = S_LOAD_DWORD_IMM [[COPY]](p4), 24, 0 :: (dereferenceable invariant load (s32), align 8, addrspace 4)
+ ; CHECK-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]].sub0:sgpr_128 = S_MOV_B32 1
+ ; CHECK-NEXT: undef [[S_MOV_B32_:%[0-9]+]].sub0:sgpr_256 = S_MOV_B32 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: S_NOP 0, implicit [[S_LOAD_DWORD_IMM]], implicit [[S_MOV_B32_]]
+ ; CHECK-NEXT: S_BRANCH %bb.2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ liveins: $sgpr4_sgpr5
+
+ %5:sgpr_64(p4) = COPY killed $sgpr4_sgpr5
+ %8:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %5(p4), 0, 0 :: (dereferenceable invariant load (s32) , align 16, addrspace 4)
+ %10:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM killed %5(p4), 24, 0 :: (dereferenceable invariant load (s32) , align 8, addrspace 4)
+ %7:sreg_32 = S_MOV_B32 1
+ undef %20.sub0:sgpr_128 = COPY %7
+ %0:sgpr_128 = COPY %20
+ %0.sub1:sgpr_128 = COPY killed %10
+ %27:sgpr_128 = COPY %0
+ %27.sub2:sgpr_128 = COPY killed %8
+ %29:sreg_32 = S_MOV_B32 0
+ undef %30.sub0:sgpr_256 = COPY %29
+ %37:sreg_32 = COPY %7
+ bb.1:
+
+ %1:sreg_32 = COPY killed %37
+ undef %33.sub0:sgpr_128 = COPY %7
+ %33.sub1:sgpr_128 = COPY killed %1
+ S_NOP 0, implicit %0, implicit %30
+ S_BRANCH %bb.2
+
+ bb.2:
+ S_ENDPGM 0
+...
|
uweigand
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SystemZ test change is OK.
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with test cleanups
| ; CHECK-NEXT: bb.1: | ||
| ; CHECK-NEXT: successors: %bb.2(0x80000000) | ||
| ; CHECK-NEXT: {{ $}} | ||
| ; CHECK-NEXT: S_NOP 0, implicit [[S_LOAD_DWORD_IMM]], implicit [[S_MOV_B32_]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a little suspicious:
Previously, this operand was %0 which had the following ultimate definitions:
.sub0 = S_MOV_B32 1
.sub1 = S_LOAD_DWORD_IMM %5, 24
.sub2 = undef
.sub3 = undef
Now, the [[S_LOAD_DWORD_IMM]] has the following ultimate defs:
.sub0 = S_MOV_B32 1
.sub1 = S_LOAD_DWORD_IMM %5, 24
.sub2 = S_LOAD_DWORD_IMM %5, 0
.sub3 = undef
The .sub2 def has switched from undef to S_LOAD_DWORD_IMM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Jeff, this behavior looks right. The modification to [[S_LOAD_DWORD_IMM]] (the COPY into .sub2) only affected a sub-register that corresponded to an undefined sub-register in the original %0. The compiler still correctly preserves the necessary live values, as sub2 was an undef.
The original issue had all sublanes defined. The main issue, this PR is trying to fix is when the copy got rematerialized with the move, the lane masks were not getting updated correctly.
I can modify the mir with all sublanes with some define if this mir seems confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey --
I agree that impact that it has had on the behavior of this test is fine -- as you say, it is just changing the bits in a (previously) undef subreg. My concern was that it was suggestive of a deeper issue -- maybe some earlier logic was using the stale lanemasks. I looked a bit at this and it doesn't seem to be related to this remat handling.
That said, it still seems a bit fishy to me -- we are unnecessaily doing that S_LOAD and are changing the liveness state of the subregs at the use, presumably, though, the subregs won't change behavior as the were originally undef. Maybe something to look into separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be correct, but we still do not want to introduce new definitions for values that were previously undef. It's de-optimizing to introduce a new dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree we don't want to introduce new defs in previously undefs. Yes, this is a deeper issue that this patch is not addressing. I will modify the mir to show the issue it is addressing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I updated the test mir with the actual case where I was seeing the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference? Can you include both versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first commit had S_NOP (with implicit to just verify liveness), which was a reduced case. I thought that was not a clear test case. In the actual bug, there was a tensor_load instruction that took a sgpr128 with all four sublanes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I include that as another test function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the other test as well, Both are similar in terms of the bug,the second is just a synthetic test case I wrote. The first one is a reduced test case from the bug. This commit in both cases doesn't change the outcome of the IR but just the liveness info which was corrupted prior to this commit.
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second look modifying the mask of the subrange doesn't look like a well formed transform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment mentions the same, but it is not being done right when there's an overlap in (SR.LaneMask & DstMask)
|
the mir I wrote was ambiguous and as mentioned above has other problems. Updated the mir test. 48B %2:sreg_32 = S_MOV_B32 1
64B undef %3.sub0:sgpr_128 = COPY %2:sreg_32
..
128B %6:sgpr_128 = COPY killed %3:sgpr_128 (lanemask 0xF3 [128r, 464B:0))
144B %6.sub1:sgpr_128 = COPY killed %1:sreg_32_xm0_xexec (lanemask 0x0C [128r,128d:1))
..
464B bb.2:Here the coalescer remats to undef %3.sub0 = S_MOV_B32 1 Because this is a new def with undef for subreg, the lanemask should be (lanemask 0x03 [128r, 464B:0)), removing other subranges that are not yet defined in the new instruction; this is what this ticket fixes. |
9a73e85 to
8e64e71
Compare
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/32477 Here is the relevant piece of the build log for the reference |
This pulls in fixes: * llvm/llvm-project#165662
This pulls in fixes: * llvm/llvm-project#165662
This pulls in fixes: * llvm/llvm-project#165662
This is a bugfix in rematerialization where the liveness of subreg mask was incorrectly updated causing crash in scheduler.