Skip to content

Conversation

@VigneshwarJ
Copy link
Contributor

This is a bugfix in rematerialization where the liveness of subreg mask was incorrectly updated causing crash in scheduler.

This is a bugfix in rematerialization where the liveness of subreg mask
was incorrectly updated causing crash in scheduler.
@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2025

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-amdgpu

Author: Vigneshwar Jayakumar (VigneshwarJ)

Changes

This is a bugfix in rematerialization where the liveness of subreg mask was incorrectly updated causing crash in scheduler.


Full diff: https://github.com/llvm/llvm-project/pull/165662.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+1)
  • (added) llvm/test/CodeGen/AMDGPU/reg-coalescer-subreg-liveness.mir (+55)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index e17a214b9a27d..acd189e4b1a8d 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1625,6 +1625,7 @@ bool RegisterCoalescer::reMaterializeDef(const CoalescerPair &CP,
           // dead def so that the interferences are properly modeled.
           if (!SR.liveAt(DefIndex))
             SR.createDeadDef(DefIndex, Alloc);
+          SR.LaneMask = DstMask & SR.LaneMask;
         }
       }
       if (UpdatedSubRanges)
diff --git a/llvm/test/CodeGen/AMDGPU/reg-coalescer-subreg-liveness.mir b/llvm/test/CodeGen/AMDGPU/reg-coalescer-subreg-liveness.mir
new file mode 100644
index 0000000000000..3afc1d343a728
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/reg-coalescer-subreg-liveness.mir
@@ -0,0 +1,55 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx950 -run-pass=register-coalescer -verify-coalescing -o - %s | FileCheck %s
+
+# This test is to check fix for failure with "Bad machine code: Defining instruction does not modify register" due to corrupt lane mask.
+
+---
+name:            reg_coalescer_subreg_liveness
+tracksRegLiveness: true
+liveins:
+body:             |
+  ; CHECK-LABEL: name: reg_coalescer_subreg_liveness
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $sgpr4_sgpr5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr4_sgpr5
+  ; CHECK-NEXT:   undef [[S_LOAD_DWORD_IMM:%[0-9]+]].sub2:sgpr_128 = S_LOAD_DWORD_IMM [[COPY]](p4), 0, 0 :: (dereferenceable invariant load (s32), align 16, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM:%[0-9]+]].sub1:sgpr_128 = S_LOAD_DWORD_IMM [[COPY]](p4), 24, 0 :: (dereferenceable invariant load (s32), align 8, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM:%[0-9]+]].sub0:sgpr_128 = S_MOV_B32 1
+  ; CHECK-NEXT:   undef [[S_MOV_B32_:%[0-9]+]].sub0:sgpr_256 = S_MOV_B32 0
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   S_NOP 0, implicit [[S_LOAD_DWORD_IMM]], implicit [[S_MOV_B32_]]
+  ; CHECK-NEXT:   S_BRANCH %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   S_ENDPGM 0
+  bb.0:
+    liveins: $sgpr4_sgpr5
+
+    %5:sgpr_64(p4) = COPY killed $sgpr4_sgpr5
+    %8:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM %5(p4), 0, 0 :: (dereferenceable invariant load (s32) , align 16, addrspace 4)
+    %10:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM killed %5(p4), 24, 0 :: (dereferenceable invariant load (s32) , align 8, addrspace 4)
+    %7:sreg_32 = S_MOV_B32 1
+    undef %20.sub0:sgpr_128 = COPY %7
+    %0:sgpr_128 = COPY %20
+    %0.sub1:sgpr_128 = COPY killed %10
+    %27:sgpr_128 = COPY %0
+    %27.sub2:sgpr_128 = COPY killed %8
+    %29:sreg_32 = S_MOV_B32 0
+    undef %30.sub0:sgpr_256 = COPY %29
+    %37:sreg_32 = COPY %7
+  bb.1:
+
+    %1:sreg_32 = COPY killed %37
+    undef %33.sub0:sgpr_128 = COPY %7
+    %33.sub1:sgpr_128 = COPY killed %1
+    S_NOP 0, implicit %0, implicit %30
+    S_BRANCH %bb.2
+
+  bb.2:
+    S_ENDPGM 0
+...

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SystemZ test change is OK.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with test cleanups

; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_NOP 0, implicit [[S_LOAD_DWORD_IMM]], implicit [[S_MOV_B32_]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a little suspicious:

Previously, this operand was %0 which had the following ultimate definitions:
.sub0 = S_MOV_B32 1
.sub1 = S_LOAD_DWORD_IMM %5, 24
.sub2 = undef
.sub3 = undef

Now, the [[S_LOAD_DWORD_IMM]] has the following ultimate defs:
.sub0 = S_MOV_B32 1
.sub1 = S_LOAD_DWORD_IMM %5, 24
.sub2 = S_LOAD_DWORD_IMM %5, 0
.sub3 = undef

The .sub2 def has switched from undef to S_LOAD_DWORD_IMM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jeff, this behavior looks right. The modification to [[S_LOAD_DWORD_IMM]] (the COPY into .sub2) only affected a sub-register that corresponded to an undefined sub-register in the original %0. The compiler still correctly preserves the necessary live values, as sub2 was an undef.

The original issue had all sublanes defined. The main issue, this PR is trying to fix is when the copy got rematerialized with the move, the lane masks were not getting updated correctly.

I can modify the mir with all sublanes with some define if this mir seems confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey --

I agree that impact that it has had on the behavior of this test is fine -- as you say, it is just changing the bits in a (previously) undef subreg. My concern was that it was suggestive of a deeper issue -- maybe some earlier logic was using the stale lanemasks. I looked a bit at this and it doesn't seem to be related to this remat handling.

That said, it still seems a bit fishy to me -- we are unnecessaily doing that S_LOAD and are changing the liveness state of the subregs at the use, presumably, though, the subregs won't change behavior as the were originally undef. Maybe something to look into separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be correct, but we still do not want to introduce new definitions for values that were previously undef. It's de-optimizing to introduce a new dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree we don't want to introduce new defs in previously undefs. Yes, this is a deeper issue that this patch is not addressing. I will modify the mir to show the issue it is addressing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I updated the test mir with the actual case where I was seeing the bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference? Can you include both versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit had S_NOP (with implicit to just verify liveness), which was a reduced case. I thought that was not a clear test case. In the actual bug, there was a tensor_load instruction that took a sgpr128 with all four sublanes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I include that as another test function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the other test as well, Both are similar in terms of the bug,the second is just a synthetic test case I wrote. The first one is a reduced test case from the bug. This commit in both cases doesn't change the outcome of the IR but just the liveness info which was corrupted prior to this commit.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second look modifying the mask of the subrange doesn't look like a well formed transform

Comment on lines 1587 to 1594
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment mentions the same, but it is not being done right when there's an overlap in (SR.LaneMask & DstMask)

@VigneshwarJ
Copy link
Contributor Author

VigneshwarJ commented Nov 3, 2025

the mir I wrote was ambiguous and as mentioned above has other problems. Updated the mir test.
The issue this ticket tries to address is

48B	  %2:sreg_32 = S_MOV_B32 1
64B	  undef %3.sub0:sgpr_128 = COPY %2:sreg_32
..
128B	  %6:sgpr_128 = COPY killed %3:sgpr_128 (lanemask 0xF3  [128r, 464B:0))
144B	  %6.sub1:sgpr_128 = COPY killed %1:sreg_32_xm0_xexec (lanemask 0x0C [128r,128d:1))
..
464B	bb.2:

Here the coalescer remats to undef %3.sub0 = S_MOV_B32 1
And then it remats 128B
updated: 128B undef %6.sub0:sgpr_128 = S_MOV_B32 1
updated: 144B %6.sub1:sgpr_128 = COPY %1:sreg_32_xm0_xexec
But the lanemask is still previous (lanemask 0xF3 [128r, 464B:0)) . This causes problem later in coalescing.

Because this is a new def with undef for subreg, the lanemask should be (lanemask 0x03 [128r, 464B:0)), removing other subranges that are not yet defined in the new instruction; this is what this ticket fixes.

@VigneshwarJ VigneshwarJ merged commit b5f2001 into llvm:main Nov 5, 2025
8 of 9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 5, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-mlir-rhel-clang running on ppc64le-mlir-rhel-test while building llvm at step 6 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/32477

Here is the relevant piece of the build log for the reference
Step 6 (test-build-check-mlir-build-only-check-mlir) failure: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
11.292 [0/1/0] Running the MLIR regression tests
command timed out: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1212.256291

antiagainst added a commit to triton-lang/triton that referenced this pull request Nov 10, 2025
khasanovaa pushed a commit to openxla/triton that referenced this pull request Nov 12, 2025
khasanovaa pushed a commit to openxla/triton that referenced this pull request Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants