-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SDAG][NVPTX] Add TLI check for preferring custom FP_TO_SINT operations to FP_TO_UINT #132470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -6179,6 +6179,33 @@ Instruction *NVPTXTargetLowering::emitTrailingFence(IRBuilderBase &Builder, | |||||||||||
| return nullptr; | ||||||||||||
| } | ||||||||||||
|
|
||||||||||||
| // Rather than default to SINT when both UINT and SINT are custom, we only | ||||||||||||
| // change the opcode when UINT is not legal and SINT is. UINT is preferred when | ||||||||||||
| // both are custom since unsigned CVT instructions can lead to slightly better | ||||||||||||
| // SASS code with fewer instructions. | ||||||||||||
| unsigned NVPTXTargetLowering::getFPToXIntOpcode(unsigned Op, EVT FromVT, | ||||||||||||
|
||||||||||||
| for (MVT VT : {MVT::i1, MVT::i16, MVT::i32, MVT::i64}) { | |
| setOperationAction( | |
| {ISD::SINT_TO_FP, ISD::UINT_TO_FP, ISD::FP_TO_SINT, ISD::FP_TO_UINT}, | |
| VT, Custom); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this current iteration we would still need it as the default hook is just implementing the hook-less behavior that was in PromoteIntRes_FP_TO_XINT. I feel like the default hook should be what NVPTX has (keep opcode when both are custom) and targets like PPC should override it, but I don't know what other targets require so my opinion isn't strong.
Your point about supported conversions seems good though, so maybe we don't need any kind of hook change for the NVPTX case. I'll look into it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that because the i8 needs to be promoted, using custom requires that we support these operations in ReplaceNodeResults. Doing that seems like re-implementing PromoteIntRes_FP_TO_XINT, so I'm not sure it's worth it if we can use the hook instead. Thoughts?
Otherwise, the flow looks like:
PromoteIntegerResult->CustomLowerNode->ReplaceNodeResults->"LLVM ERROR: Unhandled custom legalization"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference either way. The patch does improve things for NVPTX, however marginally.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this change buy us? It's possible that I've missed something. Can you point me at the suboptimal code LLVM generates now?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first two test cases should have different PTX with the change. You can see the SASS differences here: https://godbolt.org/z/GjqYzejPz
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it matter in practice? The change affects only conversions to If the goal is to have upper bits of the result always be 0, even for out of range inputs, then this patch is insufficient. It will still fill in upper bits for some out-of range input values, because we're converting to a 16-bit int. E.g converting Perhaps I'm still missing something. Can you elaborate on what motivates this change and what exactly is the issue it is intended to solve. Simply using PTX instruction with a better matching name but no effect on the valid inputs is not worth plumbing special case into generic LLVM code. IMO.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I generally agree with your reasoning as to why this seems irrelevant on the PTX level. However...
Can you provide your opinion on the SASS diff I linked? With change: It's definitely something of a corner case, but it does seem better. The biggest thing for me is that this implies that the signed/unsigned distinction is relevant for ptxas and is it not just a better name.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There's obviously one less instruction. However, considering that float to On the other hand, it adds non-zero complexity to LLVM which will need to be maintained ~forever. Considering near-zero benefit vs non-zero maintenance cost, I'd say it's not worth it. The impact is minor either way, so if the change could be plausibly used for other targets, I'd be fine with it. @arsenm -- would AMDGPU benefit from the distinction in signedness in FP-to-int conversions?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the cases that require larger expansions, unsigned is always preferable |
||
| ; RUN: llc < %s -march=nvptx64 -mcpu=sm_70 | FileCheck %s | ||
| ; RUN: llc < %s -march=nvptx64 -mcpu=sm_80 | FileCheck %s | ||
| ; RUN: llc < %s -march=nvptx64 -mcpu=sm_90 | FileCheck %s | ||
|
|
||
| define i8 @cvt_u8_f32(float %x) { | ||
| ; CHECK-LABEL: cvt_u8_f32( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .b32 %r<2>; | ||
| ; CHECK-NEXT: .reg .f32 %f<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.f32 %f1, [cvt_u8_f32_param_0]; | ||
| ; CHECK-NEXT: cvt.rzi.u16.f32 %rs1, %f1; | ||
| ; CHECK-NEXT: cvt.u32.u16 %r1, %rs1; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = fptoui float %x to i8 | ||
| ret i8 %a | ||
| } | ||
|
|
||
| define i8 @cvt_u8_f64(double %x) { | ||
| ; CHECK-LABEL: cvt_u8_f64( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .b32 %r<2>; | ||
| ; CHECK-NEXT: .reg .f64 %fd<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.f64 %fd1, [cvt_u8_f64_param_0]; | ||
| ; CHECK-NEXT: cvt.rzi.u16.f64 %rs1, %fd1; | ||
| ; CHECK-NEXT: cvt.u32.u16 %r1, %rs1; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = fptoui double %x to i8 | ||
| ret i8 %a | ||
| } | ||
|
|
||
| define float @cvt_f32_i8(i8 %x) { | ||
| ; CHECK-LABEL: cvt_f32_i8( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .f32 %f<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.u8 %rs1, [cvt_f32_i8_param_0]; | ||
| ; CHECK-NEXT: cvt.rn.f32.u16 %f1, %rs1; | ||
| ; CHECK-NEXT: st.param.f32 [func_retval0], %f1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = uitofp i8 %x to float | ||
| ret float %a | ||
| } | ||
|
|
||
| define double @cvt_f64_i8(i8 %x) { | ||
| ; CHECK-LABEL: cvt_f64_i8( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .f64 %fd<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.u8 %rs1, [cvt_f64_i8_param_0]; | ||
| ; CHECK-NEXT: cvt.rn.f64.u16 %fd1, %rs1; | ||
| ; CHECK-NEXT: st.param.f64 [func_retval0], %fd1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = uitofp i8 %x to double | ||
| ret double %a | ||
| } | ||
|
|
||
| define float @cvt_f32_s8(i8 %x) { | ||
| ; CHECK-LABEL: cvt_f32_s8( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .f32 %f<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.s8 %rs1, [cvt_f32_s8_param_0]; | ||
| ; CHECK-NEXT: cvt.rn.f32.s16 %f1, %rs1; | ||
| ; CHECK-NEXT: st.param.f32 [func_retval0], %f1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = sitofp i8 %x to float | ||
| ret float %a | ||
| } | ||
|
|
||
| define double @cvt_f64_s8(i8 %x) { | ||
| ; CHECK-LABEL: cvt_f64_s8( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .f64 %fd<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.s8 %rs1, [cvt_f64_s8_param_0]; | ||
| ; CHECK-NEXT: cvt.rn.f64.s16 %fd1, %rs1; | ||
| ; CHECK-NEXT: st.param.f64 [func_retval0], %fd1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = sitofp i8 %x to double | ||
| ret double %a | ||
| } | ||
|
|
||
| define i8 @cvt_s8_f32(float %x) { | ||
| ; CHECK-LABEL: cvt_s8_f32( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .b32 %r<3>; | ||
| ; CHECK-NEXT: .reg .f32 %f<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.f32 %f1, [cvt_s8_f32_param_0]; | ||
| ; CHECK-NEXT: cvt.rzi.s16.f32 %rs1, %f1; | ||
| ; CHECK-NEXT: cvt.u32.u16 %r1, %rs1; | ||
| ; CHECK-NEXT: and.b32 %r2, %r1, 255; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r2; | ||
| ; CHECK-NEXT: ret; | ||
| %a = fptosi float %x to i8 | ||
| ret i8 %a | ||
| } | ||
|
|
||
| define i8 @cvt_s8_f64(double %x) { | ||
| ; CHECK-LABEL: cvt_s8_f64( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .b32 %r<3>; | ||
| ; CHECK-NEXT: .reg .f64 %fd<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.f64 %fd1, [cvt_s8_f64_param_0]; | ||
| ; CHECK-NEXT: cvt.rzi.s16.f64 %rs1, %fd1; | ||
| ; CHECK-NEXT: cvt.u32.u16 %r1, %rs1; | ||
| ; CHECK-NEXT: and.b32 %r2, %r1, 255; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r2; | ||
| ; CHECK-NEXT: ret; | ||
| %a = fptosi double %x to i8 | ||
| ret i8 %a | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name could use work. getPreferredFPToIntOpcode?