Skip to content

Conversation

@s-barannikov
Copy link
Contributor

@s-barannikov s-barannikov commented Dec 18, 2024

This change reduces the number of skipped patterns as follows:

AArch64:   8548 ->  8547  (-1)
AMDGPU:   11245 -> 11204  (-41)
BPF:        197 ->   194  (-3)
M68k:       840 ->   838  (-2)
X86:      25857 -> 25856  (-1)

It appears that some patterns were already importable, but the generated code was invalid (the result was discarded).
This used to happen with "standalone" patterns in which the source DAG has more results that the destination DAG, and the destination DAG's root instruction has implicit defs. The most common case is that the source DAG produces one result and the root instruction has no explicit defs (e.g., setcc patterns).
This doesn't error out because we assume that PatternToMatch::getDstRegs() returns non-empty set if any physical register defs are involved (and issue an error if this is the case), but the set is always empty for standalone patterns.

While this patch correctly imports patterns with implicit defs, backends don't seem to be prepared to deal with them.
If, for example, the result of a generic instruction is copied into a physical register, we may end up with something like this:

DEF_MI implicit-def $some_reg1
%vreg = COPY $some_reg1
$some_reg2 = COPY %vreg

The issue with this code is summarized here. In short, we are not constraining %vreg to a proper register class, which causes verification failure "VReg has no regclass after selection".
This can be observed on X86/GlobalISel/mul-scalar.ll test, which I "fixed" by disabling importing the pattern.

@s-barannikov s-barannikov force-pushed the tablegen/gisel/physreg-patterns branch 3 times, most recently from 16c07cc to c565ea4 Compare December 21, 2024 05:27
@s-barannikov s-barannikov changed the title [WIP][TableGen][GISel] Learn to import patterns with optional/physreg defs [WIP][TableGen][GISel] Learn to import patterns with physreg defs Dec 21, 2024
@github-actions
Copy link

github-actions bot commented Dec 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@s-barannikov s-barannikov force-pushed the tablegen/gisel/physreg-patterns branch 3 times, most recently from 8f8b945 to 1ca4d60 Compare December 21, 2024 06:59
@s-barannikov s-barannikov changed the title [WIP][TableGen][GISel] Learn to import patterns with physreg defs [TableGen][GISel] Learn to import patterns with physreg defs Dec 21, 2024
@s-barannikov
Copy link
Contributor Author

s-barannikov commented Dec 21, 2024

I'd like some help with "VReg has no regclass after selection" issue, I'm not sure how to fix it.

  • Insert ConstrainOperandToRegClass to an arbitrary register class? Probably wrong.
  • Insert ConstrainSelectedInstOperands for the virtreg <-- physreg COPY in the hope that backends will constrain the virtual register? They currently don't.
  • Insert some new ConstrainXXX designed specifically for virtreg <-- physreg copies (and add a new virtual method that target will need to implement)?
  • Fix the linked FIXME somehow? Probably needs a target hook one way or another, and I don't see a way of how to plug it in.
  • Let it be for now? Hopefully :)

@s-barannikov s-barannikov force-pushed the tablegen/gisel/physreg-patterns branch from 1ca4d60 to 54a0395 Compare December 22, 2024 15:14
@arsenm
Copy link
Contributor

arsenm commented Dec 23, 2024

I'd expect an arbitrary copy from a physical register to be copied to/from a virtual register with class getMinimalPhysRegClass (or whatever the equivalent is in CodeGenRegisters. getRegClassForRegister?).

ConstrainSelectedInstOperands won't work because that just uses the classes from the instruction definition, where generic pseudos like COPY do not have any available register class information. These need manual selection

@s-barannikov
Copy link
Contributor Author

s-barannikov commented Dec 23, 2024

I'd expect an arbitrary copy from a physical register to be copied to/from a virtual register with class getMinimalPhysRegClass (or whatever the equivalent is in CodeGenRegisters. getRegClassForRegister?).

That's what I thought, too. It appears that getMinimalPhysRegClass may return an unallocatable register class and we aren't allowed to create virtual registers with unallocatable classes. This is the case on my target and I believe ARM would suffer from this too as its "condition codes" register class is unallocatable as well. getRegClassForRegister looks better, but it returns a register class that falls under category "some arbitrary register class".

In SelectionDAG, the choice of the class is done by InstrEmitter, and the logic is rather complicated. In the simplest case it just calls TLI->getRegClassFor(), which returns the class registered by a call to addRegisterClass() in target's TargetLowering constructor. At TableGen level, we don't know what register classes are going to be legal for a value type, so we probably need to emit something to do a runtime check. Maybe something like GIR_Constrain[PhysReg]Copy that would call a target hook?

ConstrainSelectedInstOperands won't work because that just uses the classes from the instruction definition, where generic pseudos like COPY do not have any available register class information. These need manual selection

I had to figure it out the hard way :(

@arsenm
Copy link
Contributor

arsenm commented Dec 23, 2024

That's what I thought, too. It appears that getMinimalPhysRegClass may return an unallocatable register class and we aren't allowed to create virtual registers with unallocatable classes.

You can find the smallest allocatable class with with getAllocatableClass(getMinimalPhysRegClass()), and there's presumably a CodeGenRegisters equivalent too

we don't know what register classes are going to be legal for a value type, so we probably need to emit something to do a runtime check.

No, there should be no need for a runtime or type check. The type based class logic in InstrEmitter is one of the issues with SelectionDAG we're trying to avoid.

@arsenm
Copy link
Contributor

arsenm commented Dec 23, 2024

You can find the smallest allocatable class with with getAllocatableClass(getMinimalPhysRegClass()), and there's presumably a CodeGenRegisters equivalent too

I think this is iterate through getRegisterClass(RegRecord)->getSuperClasses() until you find an allocatable one

@s-barannikov
Copy link
Contributor Author

I'll give it a try, thanks.

@s-barannikov
Copy link
Contributor Author

So I tried.

The register class containing the register is the only register class containing this register, and it is not allocatable. That is, there are no sub- or super-classes containing this register to iterate over.

I seem to need a "cross copy" register class. There is a TargetRegisterInfo::getCrossCopyRegClass, which I thought may be lifted to TableGen level, but some targets that override it (AMDGPU in particular) choose a register class based on dynamic values such as subtarget features.

This may not be a big issue, we could still have a notion of "cross copy" register class at TableGen level, which could be used in cases like this.

The current idea is to first try getMinimalPhysRegClass and resort to cross-copy register class if it fails to find an allocatable class. Does this sound like the way to go or should I look for another solution?

@arsenm
Copy link
Contributor

arsenm commented Dec 26, 2024

The register class containing the register is the only register class containing this register, and it is not allocatable. That is, there are no sub- or super-classes containing this register to iterate over.

I seem to need a "cross copy" register class. There is a TargetRegisterInfo::getCrossCopyRegClass, which I thought may be lifted to TableGen level, but some targets that override it (AMDGPU in particular) choose a register class based on dynamic values such as subtarget features.

Both of the AMDGPU cases look like hacks to me. In particular the SCC_CLASS handling does not do what we would want to do in globalisel. (I should probably try removing this but will likely never get around to it)

The current idea is to first try getMinimalPhysRegClass and resort to cross-copy register class if it fails to find an allocatable class. Does this sound like the way to go or should I look for another solution?

Some kind of fallback process makes sense to me. In tablegen getRegClassToRegister, find an allocatable superclass, and if one isn't available just find something you can copy to. The allocation processes later can adjust the exact classes anyway,=

@s-barannikov s-barannikov force-pushed the tablegen/gisel/physreg-patterns branch from 54a0395 to 5d3e267 Compare December 26, 2024 18:35
@s-barannikov
Copy link
Contributor Author

CC @e-kud

def SCC_CLASS : SIRegisterClass<"AMDGPU", [i1], 1, (add SCC)> {
let CopyCost = -1;
let isAllocatable = 0;
let CrossCopyRegClass = SReg_32_XM0_XEXEC;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is correct. The class is the one returned by getCrossCopyRegClass in Wave32 case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should use one of the flavors of SReg_32. SReg_32 itself should be fine

@s-barannikov
Copy link
Contributor Author

s-barannikov commented Dec 28, 2024

@arsenm Could you take a look at the failing test?
The change allows importing S_CMP_* patterns, and there is one test where this causes machine verifier errors on vgpr->sgpr copy (the argument of S_CMP_*). IIUC this should be fixed by SIFixSGPRCopies pass, but adding it to GCNPassConfig::addGlobalInstructionSelect (and disabling machine verifier in between) didn't help.

  %43:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V1_gfx12 %205:vgpr_32, %39.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512, %39.sub0_sub1_sub2_sub3:sgpr_512, 1, 1, 0, 0, 0, -1, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), addrspace 8)
  %224:sreg_32 = COPY killed %43:vgpr_32
  nofpexcept S_CMP_EQ_F32 killed %224:sreg_32, 0, implicit-def $scc, implicit $mode
  %46:sreg_32_xm0_xexec = COPY killed $scc

@s-barannikov s-barannikov force-pushed the tablegen/gisel/physreg-patterns branch from 0f169db to f99334d Compare January 5, 2025 11:42
@arsenm
Copy link
Contributor

arsenm commented Jan 6, 2025

@arsenm Could you take a look at the failing test? The change allows importing S_CMP_* patterns, and there is one test where this causes machine verifier errors on vgpr->sgpr copy (the argument of S_CMP_*).

IIUC this should be fixed by pass,

Not in GlobalISel. SIFixSGPRCopies is a giant hack for the DAG, half the reason we want globalisel is to delete it. It's probably a regbankselect bug

  %43:vgpr_32 = IMAGE_SAMPLE_LZ_V1_V1_gfx12 %205:vgpr_32, %39.sub8_sub9_sub10_sub11_sub12_sub13_sub14_sub15:sgpr_512, %39.sub0_sub1_sub2_sub3:sgpr_512, 1, 1, 0, 0, 0, -1, 0, 0, 0, implicit $exec :: (dereferenceable load (s32), addrspace 8)
  %224:sreg_32 = COPY killed %43:vgpr_32
  nofpexcept S_CMP_EQ_F32 killed %224:sreg_32, 0, implicit-def $scc, implicit $mode
  %46:sreg_32_xm0_xexec = COPY killed $scc

What was the pre-select MIR?

def SCC_CLASS : SIRegisterClass<"AMDGPU", [i1], 1, (add SCC)> {
let CopyCost = -1;
let isAllocatable = 0;
let CrossCopyRegClass = SReg_32_XM0_XEXEC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should use one of the flavors of SReg_32. SReg_32 itself should be fine

// value means copying is extremely expensive or impossible.
int CopyCost = 1;

RegisterClass CrossCopyRegClass = ?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs documentation. Also this doesn't seem like the right terminology. It's more of a virtual register class to use for copies with an unallocatable physical register, not a general cross copy. Maybe AllocatableRegClass?

@s-barannikov
Copy link
Contributor Author

What was the pre-select MIR?

Before regbank-select:

  %52:_(s32) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.lz.2d), 1, %71:_(<2 x s16>), $noreg, %37:_(<8 x s32>), %192:_(<4 x s32>), 0, 0, 0, 1 :: (dereferenceable load (s32), addrspace 8)
  %53:_(s32) = G_FCONSTANT float 1.000000e+00
  %54:_(s1) = G_FCMP floatpred(oeq), %52:_(s32), %53:_

Before instruction-select:

  %52:vgpr(s32) = G_AMDGPU_INTRIN_IMAGE_LOAD intrinsic(@llvm.amdgcn.image.sample.lz.2d), 1, %213:vgpr(<2 x s16>), $noreg, %37:sgpr(<8 x s32>), %192:sgpr(<4 x s32>), 0, 0, 0, 1 :: (dereferenceabl
e load (s32), addrspace 8)
  %53:sgpr(s32) = G_FCONSTANT float 1.000000e+00
  %214:vgpr(s32) = COPY %53:sgpr(s32)
  %54:vcc(s1) = G_FCMP floatpred(oeq), %52:vgpr(s32), %214:vgpr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants