Skip to content

Conversation

@AbhayKanhere
Copy link
Contributor

Currently MachineVerifier is missing verifying early-clobber operand constraint.
The only other machine operand constraint - TiedTo is already verified.

@AbhayKanhere
Copy link
Contributor Author

Test failures on linux and windows are the same test
CodeGen/AMDGPU/GlobalISel/inst-select-mad_64_32.mir

@AbhayKanhere
Copy link
Contributor Author

@ mbrkusanin I see you have a PR on AMDGPU global Isel. I tried locating the bug (failure with early-clobber not set when G_AMDGPU_MAD_U64_U32 is emitted) and I tried fixing in legalizer llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp line 4091. but that is not sufficient. I could not locate (with quick searches) where this is emitted in Global ISel. any help appreciated.

@AbhayKanhere AbhayKanhere force-pushed the eng/abhay/machine-verify-early-clobber branch from 984f262 to a0a0504 Compare August 4, 2025 17:36
@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Abhay Kanhere (AbhayKanhere)

Changes

Currently MachineVerifier is missing verifying early-clobber operand constraint.
The only other machine operand constraint - TiedTo is already verified.


Full diff: https://github.com/llvm/llvm-project/pull/151421.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineVerifier.cpp (+7)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+1)
diff --git a/llvm/lib/CodeGen/MachineVerifier.cpp b/llvm/lib/CodeGen/MachineVerifier.cpp
index 01703fe09b79a..ebef1c9034f4a 100644
--- a/llvm/lib/CodeGen/MachineVerifier.cpp
+++ b/llvm/lib/CodeGen/MachineVerifier.cpp
@@ -2325,6 +2325,13 @@ void MachineVerifier::visitMachineInstrBefore(const MachineInstr *MI) {
       report("Missing mayStore flag", MI);
   }
 
+  // Verify earlyClobber def operand
+  if (MCID.getOperandConstraint(0, MCOI::EARLY_CLOBBER) != -1) {
+    if (!MI->getOperand(0).isReg())
+      report("Early clobber must be a register", MI);
+    if (!MI->getOperand(0).isEarlyClobber())
+      report("Missing earlyClobber flag", MI);
+  }
   // Debug values must not have a slot index.
   // Other instructions must have one, unless they are inside a bundle.
   if (LiveInts) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 1a63c48e3666c..8a6ef89a4a062 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -584,6 +584,7 @@ bool AMDGPUInstructionSelector::selectG_AMDGPU_MAD_64_32(
   I.setDesc(TII.get(Opc));
   I.addOperand(*MF, MachineOperand::CreateImm(0));
   I.addImplicitDefUseOperands(*MF);
+  I.getOperand(0).setIsEarlyClobber(true);
   return constrainSelectedInstRegOperands(I, TII, TRI, RBI);
 }
 

@AbhayKanhere
Copy link
Contributor Author

This exposed more test fails someone with AMDGPU knowledge should look into whether this is expected. !
@mbrkusanin @amd-arsuresh @arsenm ? Who can help me naviagte these AMD GPU tests updates.

2025-08-04T18:08:20.5815070Z Failed Tests (10):
LLVM :: CodeGen/AMDGPU/GlobalISel/mul-known-bits.i64.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/mul.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/sdivrem.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/srem.i64.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/udivrem.ll
LLVM :: CodeGen/AMDGPU/div_v2i128.ll
LLVM :: CodeGen/AMDGPU/fptoi.i128.ll
LLVM :: CodeGen/AMDGPU/integer-mad-patterns.ll
LLVM :: CodeGen/AMDGPU/vector-reduce-mul.ll

@AbhayKanhere
Copy link
Contributor Author

FYI I fixed these 10 tests locally and found 11 more. .. Wanted some feedback if this is worth fixing or not.
these are 11 more fails that show up
CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.set.inactive.ll CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.wwm.ll CodeGen/AMDGPU/amdgpu-cs-chain-cc.ll CodeGen/AMDGPU/amdgpu-cs-chain-preserve-cc.ll CodeGen/AMDGPU/llvm.amdgcn.init.whole.wave-w32.ll CodeGen/AMDGPU/llvm.amdgcn.init.whole.wave-w64.ll CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll CodeGen/AMDGPU/llvm.amdgcn.mfma.ll CodeGen/AMDGPU/llvm.amdgcn.set.inactive.chain.arg.ll CodeGen/AMDGPU/mfma-loop.ll

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests in test/MachineVerifier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a property of the instruction definition and not require manual setting

Copy link
Contributor Author

@AbhayKanhere AbhayKanhere Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earlyClobber and TiedTo are two of the OperandConstraint -- we check for tiedTo but not earlyClobber in machine verifier. on CPU targets register allocators strictly look at earlyClobber on operand. Recently we discovered that when MI.setDesc is used to modify the opcode, this constraint on Operand is not updated. However if new instruction is built and operand added using addOperand() , then earlyClobber is set correctly.. Looks like AMDGPU backend makes heavy use of setDesc (since the earlySelect expects in-place change to opcode) but does not update earlyClobber. I found many instances of this.

Copy link
Contributor Author

@AbhayKanhere AbhayKanhere Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally we discussed a variant for MI.setDesc(MCID, bool updateOpConstraint=false) that updates OperandConstraints if passed true, but setDesc is often part of a set of update operations on operands and not the last in the sequence so it is not always possible to update operand constraints there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that MI.setDesc which only updates opcode, and MI.addOperand following MI.buildInstr/ that create new instruction behave differently in how Operand constraints are updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, still needs tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm It appears there is no way to write machine verifier test for this early clobber property (except by adding a fake test pass that generates bad IR without earlyclobber constraint set on machine operand) since when MIR is loaded, this constraint gets set -- so there is no way to feed in a MIR with unset early-clobber. This verifier has proved its usefulness with errors it has flagged as seen in these uses of MI.setDesc here and in our internal downstream code. I'd like to get this approved to merge in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use inline asm? It is also possible to write unit tests, but it's more annoying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline asm will not work either - early clobber is set in IR. inline asm are passed on to assembler intact.

@AbhayKanhere AbhayKanhere force-pushed the eng/abhay/machine-verify-early-clobber branch from a0a0504 to f70f3ba Compare August 6, 2025 22:32
@AbhayKanhere
Copy link
Contributor Author

I tried machine verifier test to check early-clobber is present. but any test .mir where earlyclobber is not set on instruction when MIR is read-in, that attribute gets set correctly via addOperands. So I am not able to test missing early-clobber crash!

@AbhayKanhere AbhayKanhere requested a review from arsenm August 7, 2025 21:57
Currently MachineVerifier is missing verifying early-clobber operand
constraint. The only other machine operand constraint -  TiedTo is
already verified.

Fix the test failure with this PR, update AMDGPU code that did not set earlyclobber
@AbhayKanhere AbhayKanhere force-pushed the eng/abhay/machine-verify-early-clobber branch from 5624107 to 20a897a Compare October 27, 2025 20:06
@AbhayKanhere
Copy link
Contributor Author

updated tests, modifications based on review comments.

@AbhayKanhere AbhayKanhere requested a review from arsenm October 31, 2025 17:23
@arsenm arsenm merged commit d998f92 into llvm:main Nov 5, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 5, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-ubuntu running on as-builder-4 while building llvm at step 7 "test-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/13307

Here is the relevant piece of the build log for the reference
Step 7 (test-check-all) failure: Test just built components: check-all completed (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -global-isel -mtriple=amdgcn -mcpu=gfx90a < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck --check-prefixes=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
# executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -global-isel -mtriple=amdgcn -mcpu=gfx90a
# .---command stderr------------
# | 
# | # After SI Fold Operands
# | # Machine code for function test_mfma_f64_16x16x4f64_splat_imm: IsSSA, TracksLiveness, Legalized, RegBankSelected, Selected
# | Function Live Ins: $sgpr0_sgpr1 in %3, $sgpr2_sgpr3 in %4, $sgpr4_sgpr5 in %5, $sgpr6_sgpr7 in %6, $vgpr0 in %7, $sgpr8 in %8, $sgpr9 in %9, $sgpr10 in %10, $sgpr11 in %11
# | 
# | bb.1.bb:
# |   liveins: $sgpr4_sgpr5
# |   %5:sreg_64 = COPY $sgpr4_sgpr5
# |   %25:sreg_64 = S_MOV_B64 0
# |   %24:sgpr_256 = REG_SEQUENCE %25:sreg_64, %subreg.sub0_sub1, %25:sreg_64, %subreg.sub2_sub3, %25:sreg_64, %subreg.sub4_sub5, %25:sreg_64, %subreg.sub6_sub7
# |   early-clobber %34:sgpr_128 = S_LOAD_DWORDX4_IMM_ec %5:sreg_64, 36, 0 :: (dereferenceable invariant load (<2 x s64>) from %ir.arg.kernarg.offset, align 4, addrspace 4)
# |   early-clobber %36:sreg_64_xexec = S_LOAD_DWORDX2_IMM_ec %5:sreg_64, 52, 0 :: (dereferenceable invariant load (s64) from %ir.arg.kernarg.offset + 16, align 4, addrspace 4)
# |   %40:vreg_64_align2 = COPY %34.sub2_sub3:sgpr_128
# |   %41:vreg_64_align2 = COPY %36:sreg_64_xexec
# |   %51:sgpr_64 = COPY %25:sreg_64
# |   %42:areg_256_align2 = REG_SEQUENCE %51:sgpr_64, %subreg.sub0_sub1, %51:sgpr_64, %subreg.sub2_sub3, %51:sgpr_64, %subreg.sub4_sub5, %51:sgpr_64, %subreg.sub6_sub7
# |   %23:areg_256_align2 = V_MFMA_F64_16X16X4F64_e64 %40:vreg_64_align2, %41:vreg_64_align2, 0, 0, 0, 0, implicit $mode, implicit $exec
# |   %26:areg_256_align2 = V_MFMA_F64_16X16X4F64_mac_e64 %40:vreg_64_align2, %41:vreg_64_align2, %23:areg_256_align2(tied-def 0), 1, 2, 3, implicit $mode, implicit $exec
# |   %50:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
# |   GLOBAL_STORE_DWORDX4_SADDR %50:vgpr_32, %26.sub0_sub1_sub2_sub3:areg_256_align2, %34.sub0_sub1:sgpr_128, 0, 0, implicit $exec :: (store (<2 x s64>) into %ir.1, align 32, addrspace 1)
# |   GLOBAL_STORE_DWORDX4_SADDR %50:vgpr_32, %26.sub4_sub5_sub6_sub7:areg_256_align2, %34.sub0_sub1:sgpr_128, 16, 0, implicit $exec :: (store (<2 x s64>) into %ir.1 + 16, basealign 32, addrspace 1)
# |   S_ENDPGM 0
# | 
# | # End machine code for function test_mfma_f64_16x16x4f64_splat_imm.
# | 
# | *** Bad machine code: Missing earlyClobber flag ***
# | - function:    test_mfma_f64_16x16x4f64_splat_imm
# | - basic block: %bb.1 bb (0x5626a6d08058)
# | - instruction: %23:areg_256_align2 = V_MFMA_F64_16X16X4F64_e64 %40:vreg_64_align2, %41:vreg_64_align2, 0, 0, 0, 0, implicit $mode, implicit $exec
# | LLVM ERROR: Found 1 machine code errors.
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -global-isel -mtriple=amdgcn -mcpu=gfx90a
# | 1.	Running pass 'CallGraph Pass Manager' on module '<stdin>'.
# | 2.	Running pass 'Verify generated machine code' on function '@test_mfma_f64_16x16x4f64_splat_imm'
# |  #0 0x000056268bcb9368 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x7f71368)
# |  #1 0x000056268bcb6a75 llvm::sys::RunSignalHandlers() (/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc+0x7f6ea75)
# |  #2 0x000056268bcba131 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
# |  #3 0x00007c1d6a445330 (/lib/x86_64-linux-gnu/libc.so.6+0x45330)
# |  #4 0x00007c1d6a49eb2c pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x9eb2c)
# |  #5 0x00007c1d6a44527e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4527e)
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 5, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-flang-rhel-clang running on ppc64le-flang-rhel-test while building llvm at step 6 "test-build-unified-tree-check-flang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/41946

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-flang) failure: 1200 seconds without output running [b'ninja', b'check-flang'], attempting to kill
...
PASS: Flang :: Transforms/stack-arrays.f90 (3960 of 3969)
PASS: Flang :: Driver/mcmodel.f90 (3961 of 3969)
PASS: Flang :: Driver/use-module-error.f90 (3962 of 3969)
PASS: flang-OldUnit :: Evaluate/real.test (3963 of 3969)
PASS: Flang :: Driver/use-module.f90 (3964 of 3969)
PASS: Flang :: Driver/fopenmp.f90 (3965 of 3969)
PASS: Flang :: Driver/omp-driver-offload.f90 (3966 of 3969)
PASS: Flang :: Driver/linker-options.f90 (3967 of 3969)
PASS: flang-Unit :: Optimizer/./FlangOptimizerTests/147/150 (3968 of 3969)
PASS: Flang :: Intrinsics/math-codegen.fir (3969 of 3969)
command timed out: 1200 seconds without output running [b'ninja', b'check-flang'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=2042.823861

@AbhayKanhere
Copy link
Contributor Author

AbhayKanhere commented Nov 5, 2025

Based on fails. I think I need to add changes removed in SIFoldOperands on AMDGPU. But I cannot reproduce these locally? I will file another PR with suggested fix.
LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
LLVM :: CodeGen/AMDGPU/mfma-loop.ll
LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll

AbhayKanhere added a commit to AbhayKanhere/llvm-project that referenced this pull request Nov 5, 2025
After PR:llvm#151421 merged
following fails in SIFoldOperands showed up.

LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
LLVM :: CodeGen/AMDGPU/mfma-loop.ll
LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll

In Folding code, if folded operand is register ensure earlyClobber
is set.
@AbhayKanhere
Copy link
Contributor Author

FYI #166600 to fix test failures seen

@vvereschaka
Copy link
Contributor

@AbhayKanhere ,

the expensive check builder (https://lab.llvm.org/buildbot/#/builders/187) is still "red" for 2 days already because of PR. Please fix it ASAP or revert your changes.

@AbhayKanhere
Copy link
Contributor Author

AbhayKanhere commented Nov 7, 2025

@vvereschaka Yes the fix PR is under review. Thanks. #166600 once merged will fix the tests failing. thanks!

arsenm added a commit that referenced this pull request Nov 8, 2025
After PR:#151421 merged
following fails in SIFoldOperands showed up.

LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
LLVM :: CodeGen/AMDGPU/mfma-loop.ll
LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll

In Folding code, if folded operand is register ensure earlyClobber is
set.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Shilei Tian <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 8, 2025
…(#166600)

After PR:llvm/llvm-project#151421 merged
following fails in SIFoldOperands showed up.

LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
LLVM :: CodeGen/AMDGPU/mfma-loop.ll
LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll

In Folding code, if folded operand is register ensure earlyClobber is
set.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Shilei Tian <[email protected]>
vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025
After PR:llvm#151421 merged
following fails in SIFoldOperands showed up.

LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll
LLVM :: CodeGen/AMDGPU/llvm.amdgcn.mfma.ll
LLVM :: CodeGen/AMDGPU/mfma-loop.ll
LLVM :: CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll

In Folding code, if folded operand is register ensure earlyClobber is
set.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Shilei Tian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants