[CodeGen] Do not use subsituteRegister to update implicit def #148068

PeimingLiu · 2025-07-10T22:09:40Z

It seems subsituteRegister checks FromReg == ToReg instead of TRI->isSubRegisterEq.

This PR simply reverts the original PR (#131361) to its initial implementation (without using subsituteRegister).

Not sure whether it is a desired fix (and by no means that I am an expert on LLVM backend), but it does fix a numeric error on our internal workload.

Original author: @sdesmalen-arm

PeimingLiu · 2025-07-10T22:15:24Z

gentle ping @sdesmalen-arm and @arsenm

arsenm · 2025-07-10T22:46:56Z

Needs test but could also be argued is just a substituteRegister bug

PeimingLiu · 2025-07-10T23:10:07Z

Needs test but could also be argued is just a substituteRegister bug

Let me try to dig the pattern that cause the issue......

PeimingLiu · 2025-07-11T00:39:15Z

I think the issue is that substituteRegister subsititues every occurrence of if reg == getReg(0) then reg -> Reg0, for the following MI

%49:vr256 = contract nofpexcept VFMADD231PSYr %49:vr256(tied-def 0), %49:vr256, %548:vr256, implicit $mxcsr

The wrong fixes leads to

%548:vr256 = contract nofpexcept VFMADD231PSYr 
                                  %548:vr256(tied-def 0), 
                                  %548:vr256,  # got substituted by mistake
                                  %49:vr256, 
                     implicit $mxcsr

(This happens since VFMADD231PSYr has 3 operands, so that the swap between Reg1 and Reg2 later won't reset the second register).

Since the original implementation only tries to update implicit-def, this won't happen:

%548:vr256 = contract nofpexcept VFMADD231PSYr 
                                 %548:vr256(tied-def 0), 
                                 %49:vr256, 
                                 %49:vr256, 
                        implicit $mxcsr

IIUC, the later one matches the behaviour of the previous implementation (i.e, without the implicit-def fixes at all). A test has been added.

llvmbot · 2025-07-11T02:11:14Z

@llvm/pr-subscribers-backend-x86

Author: Peiming Liu (PeimingLiu)

Changes

It seems subsituteRegister checks FromReg == ToReg instead of TRI->isSubRegisterEq.

This PR simply reverts the original PR (#131361) to its initial implementation (without using subsituteRegister).

Not sure whether it is a desired fix (and by no means that I am an expert on LLVM backend), but it does fix a numeric error on our internal workload.

Original author: @sdesmalen-arm

Full diff: https://github.com/llvm/llvm-project/pull/148068.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (+21-8)
(modified) llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir (+18-1)

diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 660a1a4d7ec47..ce4234bf8a007 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -228,6 +228,24 @@ MachineInstr *TargetInstrInfo::commuteInstructionImpl(MachineInstr &MI,
     SubReg0 = SubReg1;
   }
 
+  // For a case like this:
+  //   %0.sub = INST %0.sub(tied), %1.sub, implicit-def %0
+  // we need to update the implicit-def after commuting to result in:
+  //   %1.sub = INST %1.sub(tied), %0.sub, implicit-def %1
+  SmallVector<unsigned> UpdateImplicitDefIdx;
+  if (HasDef && MI.hasImplicitDef() && MI.getOperand(0).getReg() != Reg0) {
+    const TargetRegisterInfo *TRI =
+        MI.getMF()->getSubtarget().getRegisterInfo();
+    Register OrigReg0 = MI.getOperand(0).getReg();
+    for (auto [OpNo, MO] : llvm::enumerate(MI.implicit_operands())) {
+      Register ImplReg = MO.getReg();
+      if ((ImplReg.isVirtual() && ImplReg == OrigReg0) ||
+          (ImplReg.isPhysical() && OrigReg0.isPhysical() &&
+           TRI->isSubRegisterEq(ImplReg, OrigReg0)))
+        UpdateImplicitDefIdx.push_back(OpNo + MI.getNumExplicitOperands());
+    }
+  }
+
   MachineInstr *CommutedMI = nullptr;
   if (NewMI) {
     // Create a new instruction.
@@ -238,15 +256,10 @@ MachineInstr *TargetInstrInfo::commuteInstructionImpl(MachineInstr &MI,
   }
 
   if (HasDef) {
-    // Use `substituteRegister` so that for a case like this:
-    //   %0.sub = INST %0.sub(tied), %1.sub, implicit-def %0
-    // the implicit-def is also updated, to result in:
-    //   %1.sub = INST %1.sub(tied), %0.sub, implicit-def %1
-    const TargetRegisterInfo &TRI =
-        *MI.getMF()->getSubtarget().getRegisterInfo();
-    Register FromReg = CommutedMI->getOperand(0).getReg();
-    CommutedMI->substituteRegister(FromReg, Reg0, /*SubRegIdx=*/0, TRI);
+    CommutedMI->getOperand(0).setReg(Reg0);
     CommutedMI->getOperand(0).setSubReg(SubReg0);
+    for (unsigned Idx : UpdateImplicitDefIdx)
+      CommutedMI->getOperand(Idx).setReg(Reg0);
   }
   CommutedMI->getOperand(Idx2).setReg(Reg1);
   CommutedMI->getOperand(Idx1).setReg(Reg2);
diff --git a/llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir b/llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir
index fe1235fe94f85..8de2beb6ba75a 100644
--- a/llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir
+++ b/llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir
@@ -10,7 +10,7 @@ body: |
     ; CHECK-LABEL: name: implicit_def_dst
     ; CHECK: [[MOV64rm:%[0-9]+]]:gr64_with_sub_8bit = MOV64rm $noreg, 1, $noreg, 0, $noreg :: (load (s64) from `ptr null`)
     ; CHECK-NEXT: [[MOV64rm1:%[0-9]+]]:gr64_with_sub_8bit = MOV64rm $noreg, 1, $noreg, 0, $noreg :: (load (s64) from `ptr null`)
-    ; CHECK-NEXT: [[MOV64rm:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = AND32rr [[MOV64rm]].sub_32bit, [[MOV64rm1]].sub_32bit, implicit-def dead $eflags, implicit-def [[MOV64rm]]
+    ; CHECK-NEXT: [[MOV64rm:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = AND32rr [[MOV64rm]].sub_32bit, [[MOV64rm1]].sub_32bit, implicit-def dead $eflags, implicit-def [[MOV64rm1]]
     ; CHECK-NEXT: RET 0, implicit [[MOV64rm]]
     %0:gr64_with_sub_8bit = MOV64rm $noreg, 1, $noreg, 0, $noreg :: (load (s64) from `ptr null`)
     %1:gr64_with_sub_8bit = MOV64rm $noreg, 1, $noreg, 0, $noreg :: (load (s64) from `ptr null`)
@@ -35,3 +35,20 @@ body: |
     %0:gr64_with_sub_8bit = COPY %1:gr64_with_sub_8bit
     RET 0, implicit %0
 ...
+# Commuting instruction with 3 ops is handled correctly.
+---
+name: commuting_3_ops
+tracksRegLiveness: true
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: commuting_3_ops
+    ; CHECK: [[DEF:%[0-9]+]]:vr256 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[DEF1:%[0-9]+]]:vr256 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vr256 = contract nofpexcept VFMADD231PSYr [[DEF]], [[DEF]], [[DEF1]], implicit $mxcsr
+    ; CHECK-NEXT: RET 0, implicit [[DEF]]
+    %0:vr256 = IMPLICIT_DEF
+    %1:vr256 = IMPLICIT_DEF
+    %0:vr256 = contract nofpexcept VFMADD231PSYr %0:vr256(tied-def 0), %0:vr256, %1:vr256, implicit $mxcsr
+    %1:vr256 = COPY %0:vr256
+    RET 0, implicit %0
+...

sdesmalen-arm

Thanks for raising this @PeimingLiu!

llvm/lib/CodeGen/TargetInstrInfo.cpp

llvm-ci · 2025-07-11T18:36:33Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/15471

Here is the relevant piece of the build log for the reference

Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x598b40931c20 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x598b40931c20 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

[CodeGen] Do not use subsituteRegister to update implicit def

db3674a

llvmbot added the llvm:codegen label Jul 10, 2025

PeimingLiu requested review from arsenm and sdesmalen-arm July 10, 2025 22:10

npanchen requested a review from topperc July 10, 2025 22:38

llvmbot added the backend:X86 label Jul 11, 2025

PeimingLiu force-pushed the users/peimingliu/partial-revert branch 3 times, most recently from 18a3e70 to c038190 Compare July 11, 2025 02:31

add test

7a347d8

PeimingLiu force-pushed the users/peimingliu/partial-revert branch from c038190 to 7a347d8 Compare July 11, 2025 03:08

sdesmalen-arm reviewed Jul 11, 2025

View reviewed changes

llvm/lib/CodeGen/TargetInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/CodeGen/TargetInstrInfo.cpp Show resolved Hide resolved

PeimingLiu requested a review from sdesmalen-arm July 11, 2025 15:59

address comment

5950e82

PeimingLiu force-pushed the users/peimingliu/partial-revert branch from b6740c7 to 5950e82 Compare July 11, 2025 16:02

remove useless check

df33a79

sdesmalen-arm approved these changes Jul 11, 2025

View reviewed changes

PeimingLiu merged commit 4b73838 into main Jul 11, 2025
9 checks passed

PeimingLiu deleted the users/peimingliu/partial-revert branch July 11, 2025 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CodeGen] Do not use subsituteRegister to update implicit def #148068

[CodeGen] Do not use subsituteRegister to update implicit def #148068

Uh oh!

PeimingLiu commented Jul 10, 2025 •

edited

Loading

Uh oh!

PeimingLiu commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

PeimingLiu commented Jul 10, 2025

Uh oh!

PeimingLiu commented Jul 11, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 11, 2025

Uh oh!

sdesmalen-arm left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[CodeGen] Do not use subsituteRegister to update implicit def #148068

[CodeGen] Do not use subsituteRegister to update implicit def #148068

Uh oh!

Conversation

PeimingLiu commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PeimingLiu commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

PeimingLiu commented Jul 10, 2025

Uh oh!

PeimingLiu commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 11, 2025

Uh oh!

sdesmalen-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

PeimingLiu commented Jul 10, 2025 •

edited

Loading

PeimingLiu commented Jul 11, 2025 •

edited

Loading