Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/RegisterCoalescer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1625,6 +1625,7 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
// dead def so that the interferences are properly modeled.
if (!SR.liveAt(DefIndex))
SR.createDeadDef(DefIndex, Alloc);
SR.LaneMask = DstMask & SR.LaneMask;
}
}
if (UpdatedSubRanges)
Expand Down
55 changes: 55 additions & 0 deletions llvm/test/CodeGen/AMDGPU/reg-coalescer-subreg-liveness.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx950 -run-pass=register-coalescer -verify-coalescing -o - %s | FileCheck %s

# This test is to check fix for failure with "Bad machine code: Defining instruction does not modify register" due to corrupt lane mask.

---
name: reg_coalescer_subreg_liveness
tracksRegLiveness: true
liveins:
body: |
; CHECK-LABEL: name: reg_coalescer_subreg_liveness
; CHECK: bb.0:
; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: liveins: $sgpr4_sgpr5
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr4_sgpr5
; CHECK-NEXT: undef [[S_LOAD_DWORD_IMM:%[0-9]+]].sub2:sgpr_128 = S_LOAD_DWORD_IMM [[COPY]](p4), 0, 0 :: (dereferenceable invariant load (s32), align 16, addrspace 4)
; CHECK-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]].sub1:sgpr_128 = S_LOAD_DWORD_IMM [[COPY]](p4), 24, 0 :: (dereferenceable invariant load (s32), align 8, addrspace 4)
; CHECK-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]].sub0:sgpr_128 = S_MOV_B32 1
; CHECK-NEXT: undef [[S_MOV_B32_:%[0-9]+]].sub0:sgpr_256 = S_MOV_B32 0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_NOP 0, implicit [[S_LOAD_DWORD_IMM]], implicit [[S_MOV_B32_]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a little suspicious:

Previously, this operand was %0 which had the following ultimate definitions:
.sub0 = S_MOV_B32 1
.sub1 = S_LOAD_DWORD_IMM %5, 24
.sub2 = undef
.sub3 = undef

Now, the [[S_LOAD_DWORD_IMM]] has the following ultimate defs:
.sub0 = S_MOV_B32 1
.sub1 = S_LOAD_DWORD_IMM %5, 24
.sub2 = S_LOAD_DWORD_IMM %5, 0
.sub3 = undef

The .sub2 def has switched from undef to S_LOAD_DWORD_IMM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jeff, this behavior looks right. The modification to [[S_LOAD_DWORD_IMM]] (the COPY into .sub2) only affected a sub-register that corresponded to an undefined sub-register in the original %0. The compiler still correctly preserves the necessary live values, as sub2 was an undef.

The original issue had all sublanes defined. The main issue, this PR is trying to fix is when the copy got rematerialized with the move, the lane masks were not getting updated correctly.

I can modify the mir with all sublanes with some define if this mir seems confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey --

I agree that impact that it has had on the behavior of this test is fine -- as you say, it is just changing the bits in a (previously) undef subreg. My concern was that it was suggestive of a deeper issue -- maybe some earlier logic was using the stale lanemasks. I looked a bit at this and it doesn't seem to be related to this remat handling.

That said, it still seems a bit fishy to me -- we are unnecessaily doing that S_LOAD and are changing the liveness state of the subregs at the use, presumably, though, the subregs won't change behavior as the were originally undef. Maybe something to look into separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be correct, but we still do not want to introduce new definitions for values that were previously undef. It's de-optimizing to introduce a new dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree we don't want to introduce new defs in previously undefs. Yes, this is a deeper issue that this patch is not addressing. I will modify the mir to show the issue it is addressing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I updated the test mir with the actual case where I was seeing the bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference? Can you include both versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit had S_NOP (with implicit to just verify liveness), which was a reduced case. I thought that was not a clear test case. In the actual bug, there was a tensor_load instruction that took a sgpr128 with all four sublanes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I include that as another test function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the other test as well, Both are similar in terms of the bug,the second is just a synthetic test case I wrote. The first one is a reduced test case from the bug. This commit in both cases doesn't change the outcome of the IR but just the liveness info which was corrupted prior to this commit.

; CHECK-NEXT: S_BRANCH %bb.2
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: S_ENDPGM 0
bb.0:
liveins: $sgpr4_sgpr5
%5:sgpr_64(p4) = COPY killed $sgpr4_sgpr5
%8:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %5(p4), 0, 0 :: (dereferenceable invariant load (s32) , align 16, addrspace 4)
%10:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM killed %5(p4), 24, 0 :: (dereferenceable invariant load (s32) , align 8, addrspace 4)
%7:sreg_32 = S_MOV_B32 1
undef %20.sub0:sgpr_128 = COPY %7
%0:sgpr_128 = COPY %20
%0.sub1:sgpr_128 = COPY killed %10
%27:sgpr_128 = COPY %0
%27.sub2:sgpr_128 = COPY killed %8
%29:sreg_32 = S_MOV_B32 0
undef %30.sub0:sgpr_256 = COPY %29
%37:sreg_32 = COPY %7
bb.1:
%1:sreg_32 = COPY killed %37
undef %33.sub0:sgpr_128 = COPY %7
%33.sub1:sgpr_128 = COPY killed %1
S_NOP 0, implicit %0, implicit %30
S_BRANCH %bb.2
bb.2:
S_ENDPGM 0
...
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/SystemZ/regcoal_remat_empty_subrange.ll
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ define void @main(i16 %in) {
; CHECK-NEXT: locghile %r3, 1
; CHECK-NEXT: o %r0, 0(%r1)
; CHECK-NEXT: larl %r1, g_222
; CHECK-NEXT: lghi %r5, 0
; CHECK-NEXT: dsgfr %r2, %r0
; CHECK-NEXT: lghi %r3, 0
; CHECK-NEXT: stgrl %r2, g_39
; CHECK-NEXT: stc %r5, 19(%r1)
; CHECK-NEXT: stc %r3, 19(%r1)
; CHECK-NEXT: br %r14
%tmp = load i32, ptr @g_151, align 4
%tmp3 = or i32 %tmp, 1
Expand Down