-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AMDGPU][True16][CodeGen] true16 codegen for bswap #122849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -3030,6 +3030,8 @@ def : GCNPat < | |||||
|
|
||||||
| // Magic number: 1 | (0 << 8) | (12 << 16) | (12 << 24) | ||||||
| // The 12s emit 0s. | ||||||
| foreach p = [NotHasTrue16BitInsts, UseFakeTrue16Insts] in | ||||||
| let True16Predicate = p in { | ||||||
| def : GCNPat < | ||||||
| (i16 (bswap i16:$a)), | ||||||
| (V_PERM_B32_e64 (i32 0), VSrc_b32:$a, (S_MOV_B32 (i32 0x0c0c0001))) | ||||||
|
|
@@ -3039,6 +3041,19 @@ def : GCNPat < | |||||
| (i32 (zext (bswap i16:$a))), | ||||||
| (V_PERM_B32_e64 (i32 0), VSrc_b32:$a, (S_MOV_B32 (i32 0x0c0c0001))) | ||||||
| >; | ||||||
| } | ||||||
|
|
||||||
| let True16Predicate = UseRealTrue16Insts in { | ||||||
| def : GCNPat < | ||||||
| (i16 (bswap i16:$a)), | ||||||
| (EXTRACT_SUBREG (V_PERM_B32_e64 (i32 0), (COPY VGPR_16:$a), (S_MOV_B32 (i32 0x0c0c0001))), lo16) | ||||||
| >; | ||||||
|
|
||||||
| def : GCNPat < | ||||||
| (i32 (zext (bswap i16:$a))), | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this handle zext or any ext
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi Matt. From the pattern this is zero filling so it should be zext. This is following the existing pattern here https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SIInstructions.td#L3039 I don't know if we need to handle sext. It seems rare that if we need to handle a byte swap and at the mean time do a sext
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't mean sext, I mean anyext
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. I think we can have both zext and anyext here like https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SIInstructions.td#L2355-L2356 But if there is only one then it should be using zext. The reason would be i.e. for example llvm-project/llvm/test/CodeGen/AMDGPU/bswap.ll Lines 567 to 568 in 98da375
and operation if we replace zext to anyext
|
||||||
| (V_PERM_B32_e64 (i32 0), (COPY VGPR_16:$a), (S_MOV_B32 (i32 0x0c0c0001))) | ||||||
| >; | ||||||
| } | ||||||
|
|
||||||
| // Magic number: 1 | (0 << 8) | (3 << 16) | (2 << 24) | ||||||
| def : GCNPat < | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume UseRealTrue16Insts implies has perm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Matt. Yes UseRealTrue16Insts implies GFX11Plus
These patterns are also guarded by
llvm-project/llvm/lib/Target/AMDGPU/SIInstructions.td
Line 3008 in 98da375