-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AMDGPU][True16][CodeGen] S_PACK_XX_B32_B16 lowering for true16 mode #162389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9115,6 +9115,63 @@ void SIInstrInfo::movePackToVALU(SIInstrWorklist &Worklist, | |
MachineOperand &Src1 = Inst.getOperand(2); | ||
const DebugLoc &DL = Inst.getDebugLoc(); | ||
|
||
if (ST.useRealTrue16Insts()) { | ||
Register SrcReg0 = Src0.getReg(); | ||
Register SrcReg1 = Src1.getReg(); | ||
|
||
if (!RI.isVGPR(MRI, SrcReg0)) { | ||
rampitec marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
SrcReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass); | ||
BuildMI(*MBB, Inst, DL, get(AMDGPU::V_MOV_B32_e32), SrcReg0).add(Src0); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should use copy, may need a separate path if there are non-register inputs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing this out. For non-register case, we should still use v_mov, I created a new condition here, and also a new test. For reg case, replacing to COPY reveals a seperate probem. This pack instruction creates a I can certainly lower this COPY in PostRAExpand, but since we don't want to generate spgr_16 in the pipeline, this might cause unexpect issue in other pass with a larger test case. I haven't figured out how to fix this properly. I would think to merge this patch as it to unblock downstream branch, and create another patch to address this issue so that it gives me more time to look at it. I also noticed that there are some bad code pattern generated, i.e. "v_mov_b32_e32 v2, v2". These should also be removed after the |
||
} | ||
if (!RI.isVGPR(MRI, SrcReg1)) { | ||
SrcReg1 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass); | ||
BuildMI(*MBB, Inst, DL, get(AMDGPU::V_MOV_B32_e32), SrcReg1).add(Src1); | ||
} | ||
bool isSrc0Reg16 = MRI.constrainRegClass(SrcReg0, &AMDGPU::VGPR_16RegClass); | ||
broxigarchen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
bool isSrc1Reg16 = MRI.constrainRegClass(SrcReg1, &AMDGPU::VGPR_16RegClass); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we actually get 16 bit sources? What would that mean? The instruction is defined to take two 32 bit sources. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes we do when there is a SALU16 used by s_pack_b32_b16. There is a test added in fix-sgpr-copies-f16-true16.mir "s_pack_ll_b32_b16_use_SALU16" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Copying the example here:
That seems like a bug in how moveToVALU handles There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the dst and ops of S_FMAC_F16, I think it's all f16 in the isel, but it's put in a sreg32. Wouldn't it be safe to remove the top zero bit from it when moved to a VALU16? We can definitly create a vgpr16, and then reg_sequence a vgpr32 on top, but these eventually will be removed in the end There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think that would be more correct, and it would avoid weird issues like this where an instruction that requires a 32-bit input sees a 16-bit input instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. It does seems a bit confusing. I'll create another patch to clean this up. We have quite a few insts in this approach |
||
|
||
auto NewMI = BuildMI(*MBB, Inst, DL, get(AMDGPU::REG_SEQUENCE), ResultReg); | ||
switch (Inst.getOpcode()) { | ||
case AMDGPU::S_PACK_LL_B32_B16: { | ||
broxigarchen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
NewMI | ||
.addReg(SrcReg0, 0, | ||
isSrc0Reg16 ? AMDGPU::NoSubRegister : AMDGPU::lo16) | ||
.addImm(AMDGPU::lo16) | ||
.addReg(SrcReg1, 0, | ||
isSrc1Reg16 ? AMDGPU::NoSubRegister : AMDGPU::lo16) | ||
.addImm(AMDGPU::hi16); | ||
} break; | ||
case AMDGPU::S_PACK_LH_B32_B16: { | ||
NewMI | ||
.addReg(SrcReg0, 0, | ||
isSrc0Reg16 ? AMDGPU::NoSubRegister : AMDGPU::lo16) | ||
.addImm(AMDGPU::lo16) | ||
.addReg(SrcReg1, 0, AMDGPU::hi16) | ||
.addImm(AMDGPU::hi16); | ||
} break; | ||
case AMDGPU::S_PACK_HL_B32_B16: { | ||
NewMI.addReg(SrcReg0, 0, AMDGPU::hi16) | ||
.addImm(AMDGPU::lo16) | ||
.addReg(SrcReg1, 0, | ||
isSrc1Reg16 ? AMDGPU::NoSubRegister : AMDGPU::lo16) | ||
.addImm(AMDGPU::hi16); | ||
} break; | ||
case AMDGPU::S_PACK_HH_B32_B16: { | ||
NewMI.addReg(SrcReg0, 0, AMDGPU::hi16) | ||
.addImm(AMDGPU::lo16) | ||
.addReg(SrcReg1, 0, AMDGPU::hi16) | ||
.addImm(AMDGPU::hi16); | ||
} break; | ||
default: | ||
llvm_unreachable("unhandled s_pack_* instruction"); | ||
} | ||
|
||
MachineOperand &Dest = Inst.getOperand(0); | ||
MRI.replaceRegWith(Dest.getReg(), ResultReg); | ||
addUsersToMoveToVALUWorklist(ResultReg, MRI, Worklist); | ||
return; | ||
} | ||
|
||
switch (Inst.getOpcode()) { | ||
case AMDGPU::S_PACK_LL_B32_B16: { | ||
Register ImmReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass); | ||
|
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Uh oh!
There was an error while loading. Please reload this page.