-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU] Add commute for some VOP3 inst #121326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
f0877bb
ae5f1e8
df6903c
ada83d6
c599af0
3a83ae5
5c9d065
d3f00dc
0e60298
707474f
79219d1
288ead2
4becc58
53b370a
0b746a8
785921f
6955a86
5e15e72
004a82d
dc2739f
378c02c
49ae569
161a2b9
4d569ae
1689c1e
328e566
9faf423
19b8ad4
cc3a125
0a89dc9
d8e6cb7
3c7bd89
bf1da57
789092d
6c35280
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2749,7 +2749,7 @@ static MachineInstr *swapRegAndNonRegOperand(MachineInstr &MI, | |||||||||
| return &MI; | ||||||||||
| } | ||||||||||
|
|
||||||||||
| static MachineInstr *swapInlineConstOperands(MachineInstr &MI, | ||||||||||
| static MachineInstr *swapImmOperands(MachineInstr &MI, | ||||||||||
| MachineOperand &NonRegOp1, | ||||||||||
| MachineOperand &NonRegOp2) { | ||||||||||
| unsigned TargetFlags = NonRegOp1.getTargetFlags(); | ||||||||||
|
|
@@ -2762,6 +2762,54 @@ static MachineInstr *swapInlineConstOperands(MachineInstr &MI, | |||||||||
| return &MI; | ||||||||||
| } | ||||||||||
|
|
||||||||||
| bool SIInstrInfo::isLegalToSwap(const MachineInstr &MI, | ||||||||||
| unsigned OpIdx0, const MachineOperand *MO0, | ||||||||||
| unsigned OpIdx1, const MachineOperand *MO1) const { | ||||||||||
| const MCInstrDesc &InstDesc = MI.getDesc(); | ||||||||||
| const MCOperandInfo &OpInfo0 = InstDesc.operands()[OpIdx0]; | ||||||||||
| const MCOperandInfo &OpInfo1 = InstDesc.operands()[OpIdx1]; | ||||||||||
| const TargetRegisterClass *DefinedRC1 = | ||||||||||
| OpInfo1.RegClass != -1 ? RI.getRegClass(OpInfo1.RegClass) : nullptr; | ||||||||||
| const TargetRegisterClass *DefinedRC0 = | ||||||||||
| OpInfo1.RegClass != -1 ? RI.getRegClass(OpInfo0.RegClass) : nullptr; | ||||||||||
|
|
||||||||||
| unsigned Opc = MI.getOpcode(); | ||||||||||
| int Src0Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src0); | ||||||||||
| if (Src0Idx == -1) { | ||||||||||
| // VOPD V_DUAL_* instructions use different operand names. | ||||||||||
| Src0Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src0X); | ||||||||||
| } | ||||||||||
|
|
||||||||||
| // Swap doesn't breach constantbus or literal limits | ||||||||||
|
||||||||||
| // Swap doesn't breach constantbus or literal limits | |
| // Swap doesn't breach constant bus or literal limits |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // FIX-ME: After gfx9, literal can be in place other than Src0 | |
| // FIXME: After gfx9, literal can be in place other than Src0 |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // No need to check 64bit literals since swapping does not bring new | |
| // 64bit literals into current instruction to fold to 32bit | |
| // No need to check 64-bit literals since swapping does not bring new | |
| // 64-bit literals into current instruction to fold to 32-bit |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,6 @@ name: not_shrink_icmp | |
| body: | | ||
| bb.0: | ||
| ; GCN-LABEL: name: not_shrink_icmp | ||
| ; GCN: S_CMP_GT_I32 1, 65, implicit-def $scc | ||
| ; GCN: S_CMP_LT_I32 65, 1, implicit-def $scc | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pointless swap but probably not this patch's problem |
||
| S_CMP_GT_I32 1, 65, implicit-def $scc | ||
| ... | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,9 @@ | ||
| # RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -run-pass=machine-cse -verify-machineinstrs %s -o - 2>&1 | FileCheck --check-prefix=GCN %s | ||
|
|
||
| --- | ||
| name: test_machine_cse_op_sel | ||
| name: test_machine_cse_op_sel_v_add_nc_u16 | ||
| body: | | ||
| ; GCN-LABEL: name: test_machine_cse_op_sel | ||
| ; GCN-LABEL: name: test_machine_cse_op_sel_v_add_nc_u16 | ||
| ; GCN: %2:vgpr_32 = V_ADD_NC_U16_e64 0, %0, 0, %1, 1, 0, implicit $mode, implicit $exec | ||
| ; GCN-NEXT: DS_WRITE2_B32_gfx9 undef %4:vgpr_32, %2, %2, 0, 1, 0, implicit $exec | ||
| bb.0: | ||
|
|
@@ -15,13 +15,45 @@ body: | | |
| ... | ||
|
|
||
| --- | ||
| name: test_machine_cse_op_inline_const | ||
| name: test_machine_cse_op_const_v_add_nc_u16 | ||
| body: | | ||
| ; GCN-LABEL: name: test_machine_cse_op_inline_const | ||
| ; GCN-LABEL: name: test_machine_cse_op_const_v_add_nc_u16 | ||
| ; GCN: %0:vgpr_32 = V_ADD_NC_U16_e64 0, 64, 0, -3, 1, 0, implicit $mode, implicit $exec | ||
| ; GCN-NEXT: DS_WRITE2_B32_gfx9 undef %2:vgpr_32, %0, %0, 0, 1, 0, implicit $exec | ||
| bb.0: | ||
| %1:vgpr_32 = V_ADD_NC_U16_e64 0, 64, 0, -3, 1, 0, implicit $mode, implicit $exec | ||
| %2:vgpr_32 = V_ADD_NC_U16_e64 0, -3, 0, 64, 1, 0, implicit $mode, implicit $exec | ||
| DS_WRITE2_B32_gfx9 undef %4:vgpr_32, %1, %2, 0, 1, 0, implicit $exec | ||
| ... | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be a dedicated commute test for every changed opcode like this. What about the _e32 forms?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, for |
||
|
|
||
| --- | ||
| name: test_machine_cse_op_v_fma_f16 | ||
| tracksRegLiveness: true | ||
| body: | | ||
| ; GCN-LABEL: name: test_machine_cse_op_v_fma_f16 | ||
| ; GCN: %3:vgpr_32 = nofpexcept V_FMA_F16_e64 0, %0, 0, %1, 0, %2, 0, 0, implicit $mode, implicit $exec | ||
| ; GCN-NEXT: DS_WRITE2_B32_gfx9 undef %5:vgpr_32, %3, %3, 0, 1, 0, implicit $exec | ||
| bb.0: | ||
| liveins: $vgpr0, $vgpr1, $vgpr2 | ||
| %0:vgpr_32 = COPY $vgpr0 | ||
| %1:vgpr_32 = COPY $vgpr1 | ||
| %2:vgpr_32 = COPY $vgpr2 | ||
| %3:vgpr_32 = nofpexcept V_FMA_F16_e64 0, %0, 0, %1, 0, %2, 0, 0, implicit $mode, implicit $exec | ||
| %4:vgpr_32 = nofpexcept V_FMA_F16_e64 0, %1, 0, %0, 0, %2, 0, 0, implicit $mode, implicit $exec | ||
| DS_WRITE2_B32_gfx9 undef %5:vgpr_32, %3, %4, 0, 1, 0, implicit $exec | ||
| ... | ||
|
|
||
| --- | ||
| name: test_machine_cse_op_const_v_fma_f16 | ||
| tracksRegLiveness: true | ||
| body: | | ||
| ; GCN-LABEL: name: test_machine_cse_op_const_v_fma_f16 | ||
| ; GCN: %1:vgpr_32 = nofpexcept V_FMA_F16_e64 0, 3481272320, 0, 1, 0, %0, 0, 0, implicit $mode, implicit $exec | ||
| ; GCN-NEXT: DS_WRITE2_B32_gfx9 undef %3:vgpr_32, %1, %1, 0, 1, 0, implicit $exec | ||
| bb.0: | ||
| liveins: $vgpr0, $vgpr1, $vgpr2 | ||
| %0:vgpr_32 = COPY $vgpr0 | ||
| %1:vgpr_32 = nofpexcept V_FMA_F16_e64 0, 1, 0, 3481272320, 0, %0, 0, 0, implicit $mode, implicit $exec | ||
| %2:vgpr_32 = nofpexcept V_FMA_F16_e64 0, 3481272320, 0, 1, 0, %0, 0, 0, implicit $mode, implicit $exec | ||
| DS_WRITE2_B32_gfx9 undef %3:vgpr_32, %1, %2, 0, 1, 0, implicit $exec | ||
| ... | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Test the src0X cases? I'm not familiar with those
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @arsenm , if I'm not wrong, there seems no commutable instruction with input operand named
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would be the VOPD forms of instructions, not freestanding instruction definitions |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This VOPD case is still missing a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arsenm , I re-searched all generated instruction, but cannot find a VOPD instruction that is commutable with operand names as $src0X...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then remove this handling?