-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[AMDGPU] Delete redundant s_or_b32 #165261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
d9321dd
8f79d82
a276310
f261b26
b4f3449
b3db724
57d9edd
919962b
7dbdbe4
92f73c6
3f8d209
d815e12
8889e21
f2e16ad
9cea63d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10160,7 +10160,7 @@ static bool followSubRegDef(MachineInstr &MI, | |
| } | ||
|
|
||
| MachineInstr *llvm::getVRegSubRegDef(const TargetInstrInfo::RegSubRegPair &P, | ||
| MachineRegisterInfo &MRI) { | ||
| const MachineRegisterInfo &MRI) { | ||
| assert(MRI.isSSA()); | ||
| if (!P.Reg.isVirtual()) | ||
| return nullptr; | ||
|
|
@@ -10689,6 +10689,32 @@ bool SIInstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg, | |
| if (!optimizeSCC(Def, &CmpInstr, RI)) | ||
| return false; | ||
|
|
||
| // If s_or_b32 result, sY, is unused (i.e. it is effectively a 64-bit | ||
| // s_cmp_lg of a register pair) and the inputs are the hi and lo-halves of a | ||
| // 64-bit foldableSelect then delete s_or_b32 in the sequence: | ||
| // sX = s_cselect_b64 (non-zero imm), 0 | ||
| // sLo = copy sX.sub0 | ||
| // sHi = copy sX.sub1 | ||
| // sY = s_or_b32 sLo, sHi | ||
| if (Def->getOpcode() == AMDGPU::S_OR_B32 && | ||
| MRI->use_nodbg_empty(Def->getOperand(0).getReg())) { | ||
| const MachineOperand &OrOpnd1 = Def->getOperand(1); | ||
| const MachineOperand &OrOpnd2 = Def->getOperand(2); | ||
| if (OrOpnd1.isReg() && OrOpnd2.isReg()) { | ||
| MachineInstr *Def1 = MRI->getUniqueVRegDef(OrOpnd1.getReg()); | ||
| MachineInstr *Def2 = MRI->getUniqueVRegDef(OrOpnd2.getReg()); | ||
|
||
| if (Def1->getOpcode() == AMDGPU::COPY && | ||
| Def2->getOpcode() == AMDGPU::COPY && Def1->getOperand(1).isReg() && | ||
| Def2->getOperand(1).isReg() && | ||
| Def1->getOperand(1).getSubReg() == AMDGPU::sub0 && | ||
| Def2->getOperand(1).getSubReg() == AMDGPU::sub1 && | ||
| Def1->getOperand(1).getReg() == Def2->getOperand(1).getReg() && | ||
| foldableSelect( | ||
| *MRI->getUniqueVRegDef(Def1->getOperand(1).getReg()))) { | ||
|
||
| optimizeSCC(Def1, Def, RI); | ||
| } | ||
| } | ||
| } | ||
| return true; | ||
| }; | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is very specific. If
sYis dead, won't all its operands be dead automatically (if there is no other uses)? Why do we need to have a special handling here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s_or_b32 may still be alive because it defines scc. This optimization ensures that the scc def is redundant because it is already available after the s_cselect_b64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intended to deal with redundant s_or_b32s that are generated when lowering 64-bit add/sub on R600 targets. R600 does not have a 64-bit s_cmp, so the s_or_b32 is used in the lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See llvm/test/CodeGen/AMDGPU/carryout-selection.ll for an example of why this optimization is needed.