-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU] selecting v_sat_pk instruction, version 2 #123297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
0b90a83
b2e29ec
4d4871c
7066fd1
3487eef
0cd2733
c7c53ab
801bf90
d89e739
8c51db5
c575ca5
b167a9b
ebca12d
bebeb48
ef61cfb
d178e51
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -461,6 +461,7 @@ enum NodeType : unsigned { | |
| FMED3, | ||
| SMED3, | ||
| UMED3, | ||
| SAT_PK_CAST, | ||
| FMAXIMUM3, | ||
| FMINIMUM3, | ||
| FDOT2, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -332,6 +332,9 @@ def AMDGPUumed3 : SDNode<"AMDGPUISD::UMED3", AMDGPUDTIntTernaryOp, | |
| [] | ||
| >; | ||
|
|
||
| // Special node to pack v2i8 into i16 for v_sat_pk lowering. | ||
| def AMDGPUsat_pk_cast : SDNode<"AMDGPUISD::SAT_PK_CAST", SDTUnaryOp, []>; | ||
|
||
|
|
||
| def AMDGPUfmed3_impl : SDNode<"AMDGPUISD::FMED3", SDTFPTernaryOp, []>; | ||
|
|
||
| def AMDGPUfdot2_impl : SDNode<"AMDGPUISD::FDOT2", | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -824,6 +824,25 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, | |||||
| {MVT::v4f32, MVT::v8f32, MVT::v16f32, MVT::v32f32}, | ||||||
| Custom); | ||||||
| } | ||||||
|
|
||||||
| // True 16 instruction is current not supported | ||||||
| // FIXME: Add support for true 16 when supported | ||||||
|
||||||
| // FIXME: Add support for true 16 when supported | |
| // FIXME: Add support |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs predicate for has the instruction
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't have to override every single type that could decompose. Ideally the combiner should be able to figure it out based on the legalizer rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arsenm , for truncate_ssat_u to be folded TLI.isOperationLegalOrCustom function has pass.
We didn't hook this function, so it goes default and will check getOperationAction(Op, SrcVT) == Custom, which will look up OpActions[(unsigned)VT.getSimpleVT().SimpleTy][Op], and this is set here.
If we do not set every vNi16 (source type), the related truncat_ssat_u will not be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For step 1 I wouldn't do this. The fix for this kind of issue is in the combiner forming them, not the legalizer rules for a specific operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arsenm , I agree, but changing the combiner will relate to changing llvm side code. currently I'm not planning on changing llvm side's code in this PR.
So I think we may either do this, or hook the TLI.isOperationLegalOrCustom function??
But I think hooking TLI.isOperationLegalOrCustom will make it strange, since the input variable is op code and SrcVT....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle the basic case in the first step, and don't worry about handling every vector perfectly right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should still remove everything other than the basic case. Leave the vector splitting for a different patch
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think you should need anything in isTypeDesirableForOp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arsenm , this function checks the destination type, while TLI.isOperationLegalOrCustom checks the source type.
The Dst type are vNi8, if we didn't return true here, it goes to the default function to check isTypeLegal(DstVT)
The backend haven't add register class for vNi8. We maybe can add the relevant register class, but makeing vNi8 legal for register class may cause unpredictable result.
Personally I think we could add the register class for relevant type when it is formally legal in the backend. So I decide to handle it here for special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arsenm , this function checks the destination type, while TLI.isOperationLegalOrCustom checks the source type.
This just sounds buggy. The interpretation of which type is the one that matters for the opcode needs to be globally consistent
The backend haven't add register class for vNi8. We maybe can add the relevant register class,
Please no, this is a huge amount of work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arsenm , yeah it is kind of weird logic. And what made it more strange is that to get into the ReplaceNodeResults function, it will check TLI.getOperationAction(Opc, DstVT) == Custom........
But if we want to change this, I think we also need to modify AArch64 backend....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, bugs cause other bugs and all the use points need to be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not think this is necessary in this patch and should be dropped
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should assert the element type is i8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this for second patch, I would hope the legalization process ends up with this in the easy form
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to insert a cast to the original type, does this not assert as-is?
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3382,6 +3382,21 @@ def : GCNPat < | |
| (v2i16 (V_LSHL_OR_B32_e64 $src1, (i32 16), (i32 (V_AND_B32_e64 (i32 (V_MOV_B32_e32 (i32 0xffff))), $src0)))) | ||
| >; | ||
|
|
||
| multiclass V_SAT_PK_Pat<Instruction inst> { | ||
| def : GCNPat< | ||
|
||
| (i16 (AMDGPUsat_pk_cast v2i16:$src)), | ||
| (inst VRegSrc_32:$src) | ||
| >; | ||
| } | ||
|
|
||
| let OtherPredicates = [NotHasTrue16BitInsts] in { | ||
| defm : V_SAT_PK_Pat<V_SAT_PK_U8_I16_e64>; | ||
| } // End OtherPredicates = [NotHasTrue16BitInsts] | ||
|
|
||
| let True16Predicate = UseFakeTrue16Insts in { | ||
| defm : V_SAT_PK_Pat<V_SAT_PK_U8_I16_fake16_e64>; | ||
| } // End True16Predicate = UseFakeTrue16Insts | ||
|
|
||
| // With multiple uses of the shift, this will duplicate the shift and | ||
| // increase register pressure. | ||
| def : GCNPat < | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.