Skip to content

Conversation

@PeimingLiu
Copy link
Member

@PeimingLiu PeimingLiu commented Jul 10, 2025

It seems subsituteRegister checks FromReg == ToReg instead of TRI->isSubRegisterEq.

This PR simply reverts the original PR (#131361) to its initial implementation (without using subsituteRegister).

Not sure whether it is a desired fix (and by no means that I am an expert on LLVM backend), but it does fix a numeric error on our internal workload.

Original author: @sdesmalen-arm

@PeimingLiu
Copy link
Member Author

gentle ping @sdesmalen-arm and @arsenm

@npanchen npanchen requested a review from topperc July 10, 2025 22:38
@arsenm
Copy link
Contributor

arsenm commented Jul 10, 2025

Needs test but could also be argued is just a substituteRegister bug

@PeimingLiu
Copy link
Member Author

Needs test but could also be argued is just a substituteRegister bug

Let me try to dig the pattern that cause the issue......

@PeimingLiu
Copy link
Member Author

PeimingLiu commented Jul 11, 2025

I think the issue is that substituteRegister subsititues every occurrence of if reg == getReg(0) then reg -> Reg0, for the following MI

%49:vr256 = contract nofpexcept VFMADD231PSYr %49:vr256(tied-def 0), %49:vr256, %548:vr256, implicit $mxcsr

The wrong fixes leads to

%548:vr256 = contract nofpexcept VFMADD231PSYr 
                                  %548:vr256(tied-def 0), 
                                  %548:vr256,  # got substituted by mistake
                                  %49:vr256, 
                     implicit $mxcsr

(This happens since VFMADD231PSYr has 3 operands, so that the swap between Reg1 and Reg2 later won't reset the second register).

Since the original implementation only tries to update implicit-def, this won't happen:

%548:vr256 = contract nofpexcept VFMADD231PSYr 
                                 %548:vr256(tied-def 0), 
                                 %49:vr256, 
                                 %49:vr256, 
                        implicit $mxcsr

IIUC, the later one matches the behaviour of the previous implementation (i.e, without the implicit-def fixes at all). A test has been added.

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-backend-x86

Author: Peiming Liu (PeimingLiu)

Changes

It seems subsituteRegister checks FromReg == ToReg instead of TRI->isSubRegisterEq.

This PR simply reverts the original PR (#131361) to its initial implementation (without using subsituteRegister).

Not sure whether it is a desired fix (and by no means that I am an expert on LLVM backend), but it does fix a numeric error on our internal workload.

Original author: @sdesmalen-arm


Full diff: https://github.com/llvm/llvm-project/pull/148068.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (+21-8)
  • (modified) llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir (+18-1)
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 660a1a4d7ec47..ce4234bf8a007 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -228,6 +228,24 @@ MachineInstr *TargetInstrInfo::commuteInstructionImpl(MachineInstr &MI,
     SubReg0 = SubReg1;
   }
 
+  // For a case like this:
+  //   %0.sub = INST %0.sub(tied), %1.sub, implicit-def %0
+  // we need to update the implicit-def after commuting to result in:
+  //   %1.sub = INST %1.sub(tied), %0.sub, implicit-def %1
+  SmallVector<unsigned> UpdateImplicitDefIdx;
+  if (HasDef && MI.hasImplicitDef() && MI.getOperand(0).getReg() != Reg0) {
+    const TargetRegisterInfo *TRI =
+        MI.getMF()->getSubtarget().getRegisterInfo();
+    Register OrigReg0 = MI.getOperand(0).getReg();
+    for (auto [OpNo, MO] : llvm::enumerate(MI.implicit_operands())) {
+      Register ImplReg = MO.getReg();
+      if ((ImplReg.isVirtual() && ImplReg == OrigReg0) ||
+          (ImplReg.isPhysical() && OrigReg0.isPhysical() &&
+           TRI->isSubRegisterEq(ImplReg, OrigReg0)))
+        UpdateImplicitDefIdx.push_back(OpNo + MI.getNumExplicitOperands());
+    }
+  }
+
   MachineInstr *CommutedMI = nullptr;
   if (NewMI) {
     // Create a new instruction.
@@ -238,15 +256,10 @@ MachineInstr *TargetInstrInfo::commuteInstructionImpl(MachineInstr &MI,
   }
 
   if (HasDef) {
-    // Use `substituteRegister` so that for a case like this:
-    //   %0.sub = INST %0.sub(tied), %1.sub, implicit-def %0
-    // the implicit-def is also updated, to result in:
-    //   %1.sub = INST %1.sub(tied), %0.sub, implicit-def %1
-    const TargetRegisterInfo &TRI =
-        *MI.getMF()->getSubtarget().getRegisterInfo();
-    Register FromReg = CommutedMI->getOperand(0).getReg();
-    CommutedMI->substituteRegister(FromReg, Reg0, /*SubRegIdx=*/0, TRI);
+    CommutedMI->getOperand(0).setReg(Reg0);
     CommutedMI->getOperand(0).setSubReg(SubReg0);
+    for (unsigned Idx : UpdateImplicitDefIdx)
+      CommutedMI->getOperand(Idx).setReg(Reg0);
   }
   CommutedMI->getOperand(Idx2).setReg(Reg1);
   CommutedMI->getOperand(Idx1).setReg(Reg2);
diff --git a/llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir b/llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir
index fe1235fe94f85..8de2beb6ba75a 100644
--- a/llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir
+++ b/llvm/test/CodeGen/X86/coalesce-commutative-implicit-def.mir
@@ -10,7 +10,7 @@ body: |
     ; CHECK-LABEL: name: implicit_def_dst
     ; CHECK: [[MOV64rm:%[0-9]+]]:gr64_with_sub_8bit = MOV64rm $noreg, 1, $noreg, 0, $noreg :: (load (s64) from `ptr null`)
     ; CHECK-NEXT: [[MOV64rm1:%[0-9]+]]:gr64_with_sub_8bit = MOV64rm $noreg, 1, $noreg, 0, $noreg :: (load (s64) from `ptr null`)
-    ; CHECK-NEXT: [[MOV64rm:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = AND32rr [[MOV64rm]].sub_32bit, [[MOV64rm1]].sub_32bit, implicit-def dead $eflags, implicit-def [[MOV64rm]]
+    ; CHECK-NEXT: [[MOV64rm:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = AND32rr [[MOV64rm]].sub_32bit, [[MOV64rm1]].sub_32bit, implicit-def dead $eflags, implicit-def [[MOV64rm1]]
     ; CHECK-NEXT: RET 0, implicit [[MOV64rm]]
     %0:gr64_with_sub_8bit = MOV64rm $noreg, 1, $noreg, 0, $noreg :: (load (s64) from `ptr null`)
     %1:gr64_with_sub_8bit = MOV64rm $noreg, 1, $noreg, 0, $noreg :: (load (s64) from `ptr null`)
@@ -35,3 +35,20 @@ body: |
     %0:gr64_with_sub_8bit = COPY %1:gr64_with_sub_8bit
     RET 0, implicit %0
 ...
+# Commuting instruction with 3 ops is handled correctly.
+---
+name: commuting_3_ops
+tracksRegLiveness: true
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: commuting_3_ops
+    ; CHECK: [[DEF:%[0-9]+]]:vr256 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[DEF1:%[0-9]+]]:vr256 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:vr256 = contract nofpexcept VFMADD231PSYr [[DEF]], [[DEF]], [[DEF1]], implicit $mxcsr
+    ; CHECK-NEXT: RET 0, implicit [[DEF]]
+    %0:vr256 = IMPLICIT_DEF
+    %1:vr256 = IMPLICIT_DEF
+    %0:vr256 = contract nofpexcept VFMADD231PSYr %0:vr256(tied-def 0), %0:vr256, %1:vr256, implicit $mxcsr
+    %1:vr256 = COPY %0:vr256
+    RET 0, implicit %0
+...

@PeimingLiu PeimingLiu force-pushed the users/peimingliu/partial-revert branch 3 times, most recently from 18a3e70 to c038190 Compare July 11, 2025 02:31
@PeimingLiu PeimingLiu force-pushed the users/peimingliu/partial-revert branch from c038190 to 7a347d8 Compare July 11, 2025 03:08
Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this @PeimingLiu!

@PeimingLiu PeimingLiu requested a review from sdesmalen-arm July 11, 2025 15:59
@PeimingLiu PeimingLiu force-pushed the users/peimingliu/partial-revert branch from b6740c7 to 5950e82 Compare July 11, 2025 16:02
@PeimingLiu PeimingLiu merged commit 4b73838 into main Jul 11, 2025
9 checks passed
@PeimingLiu PeimingLiu deleted the users/peimingliu/partial-revert branch July 11, 2025 18:28
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 11, 2025

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/15471

Here is the relevant piece of the build log for the reference
Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x598b40931c20 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x598b40931c20 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants