-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SDAG][NVPTX] Add TLI check for preferring custom FP_TO_SINT operations to FP_TO_UINT #132470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this change buy us? It's possible that I've missed something. Can you point me at the suboptimal code LLVM generates now?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first two test cases should have different PTX with the change. You can see the SASS differences here: https://godbolt.org/z/GjqYzejPz
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it matter in practice? The change affects only conversions to If the goal is to have upper bits of the result always be 0, even for out of range inputs, then this patch is insufficient. It will still fill in upper bits for some out-of range input values, because we're converting to a 16-bit int. E.g converting Perhaps I'm still missing something. Can you elaborate on what motivates this change and what exactly is the issue it is intended to solve. Simply using PTX instruction with a better matching name but no effect on the valid inputs is not worth plumbing special case into generic LLVM code. IMO.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I generally agree with your reasoning as to why this seems irrelevant on the PTX level. However...
Can you provide your opinion on the SASS diff I linked? With change: It's definitely something of a corner case, but it does seem better. The biggest thing for me is that this implies that the signed/unsigned distinction is relevant for ptxas and is it not just a better name.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There's obviously one less instruction. However, considering that float to On the other hand, it adds non-zero complexity to LLVM which will need to be maintained ~forever. Considering near-zero benefit vs non-zero maintenance cost, I'd say it's not worth it. The impact is minor either way, so if the change could be plausibly used for other targets, I'd be fine with it. @arsenm -- would AMDGPU benefit from the distinction in signedness in FP-to-int conversions?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the cases that require larger expansions, unsigned is always preferable |
||
| ; RUN: llc < %s -march=nvptx64 -mcpu=sm_70 | FileCheck %s | ||
| ; RUN: llc < %s -march=nvptx64 -mcpu=sm_80 | FileCheck %s | ||
| ; RUN: llc < %s -march=nvptx64 -mcpu=sm_90 | FileCheck %s | ||
|
|
||
| define i8 @cvt_u8_f32(float %x) { | ||
| ; CHECK-LABEL: cvt_u8_f32( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .b32 %r<2>; | ||
| ; CHECK-NEXT: .reg .f32 %f<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.f32 %f1, [cvt_u8_f32_param_0]; | ||
| ; CHECK-NEXT: cvt.rzi.u16.f32 %rs1, %f1; | ||
| ; CHECK-NEXT: cvt.u32.u16 %r1, %rs1; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = fptoui float %x to i8 | ||
| ret i8 %a | ||
| } | ||
|
|
||
| define i8 @cvt_u8_f64(double %x) { | ||
| ; CHECK-LABEL: cvt_u8_f64( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .b32 %r<2>; | ||
| ; CHECK-NEXT: .reg .f64 %fd<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.f64 %fd1, [cvt_u8_f64_param_0]; | ||
| ; CHECK-NEXT: cvt.rzi.u16.f64 %rs1, %fd1; | ||
| ; CHECK-NEXT: cvt.u32.u16 %r1, %rs1; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = fptoui double %x to i8 | ||
| ret i8 %a | ||
| } | ||
|
|
||
| define float @cvt_f32_i8(i8 %x) { | ||
| ; CHECK-LABEL: cvt_f32_i8( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .f32 %f<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.u8 %rs1, [cvt_f32_i8_param_0]; | ||
| ; CHECK-NEXT: cvt.rn.f32.u16 %f1, %rs1; | ||
| ; CHECK-NEXT: st.param.f32 [func_retval0], %f1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = uitofp i8 %x to float | ||
| ret float %a | ||
| } | ||
|
|
||
| define double @cvt_f64_i8(i8 %x) { | ||
| ; CHECK-LABEL: cvt_f64_i8( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .f64 %fd<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.u8 %rs1, [cvt_f64_i8_param_0]; | ||
| ; CHECK-NEXT: cvt.rn.f64.u16 %fd1, %rs1; | ||
| ; CHECK-NEXT: st.param.f64 [func_retval0], %fd1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = uitofp i8 %x to double | ||
| ret double %a | ||
| } | ||
|
|
||
| define float @cvt_f32_s8(i8 %x) { | ||
| ; CHECK-LABEL: cvt_f32_s8( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .f32 %f<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.s8 %rs1, [cvt_f32_s8_param_0]; | ||
| ; CHECK-NEXT: cvt.rn.f32.s16 %f1, %rs1; | ||
| ; CHECK-NEXT: st.param.f32 [func_retval0], %f1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = sitofp i8 %x to float | ||
| ret float %a | ||
| } | ||
|
|
||
| define double @cvt_f64_s8(i8 %x) { | ||
| ; CHECK-LABEL: cvt_f64_s8( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .f64 %fd<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.s8 %rs1, [cvt_f64_s8_param_0]; | ||
| ; CHECK-NEXT: cvt.rn.f64.s16 %fd1, %rs1; | ||
| ; CHECK-NEXT: st.param.f64 [func_retval0], %fd1; | ||
| ; CHECK-NEXT: ret; | ||
| %a = sitofp i8 %x to double | ||
| ret double %a | ||
| } | ||
|
|
||
| define i8 @cvt_s8_f32(float %x) { | ||
| ; CHECK-LABEL: cvt_s8_f32( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .b32 %r<3>; | ||
| ; CHECK-NEXT: .reg .f32 %f<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.f32 %f1, [cvt_s8_f32_param_0]; | ||
| ; CHECK-NEXT: cvt.rzi.s16.f32 %rs1, %f1; | ||
| ; CHECK-NEXT: cvt.u32.u16 %r1, %rs1; | ||
| ; CHECK-NEXT: and.b32 %r2, %r1, 255; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r2; | ||
| ; CHECK-NEXT: ret; | ||
| %a = fptosi float %x to i8 | ||
| ret i8 %a | ||
| } | ||
|
|
||
| define i8 @cvt_s8_f64(double %x) { | ||
| ; CHECK-LABEL: cvt_s8_f64( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b16 %rs<2>; | ||
| ; CHECK-NEXT: .reg .b32 %r<3>; | ||
| ; CHECK-NEXT: .reg .f64 %fd<2>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.f64 %fd1, [cvt_s8_f64_param_0]; | ||
| ; CHECK-NEXT: cvt.rzi.s16.f64 %rs1, %fd1; | ||
| ; CHECK-NEXT: cvt.u32.u16 %r1, %rs1; | ||
| ; CHECK-NEXT: and.b32 %r2, %r1, 255; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r2; | ||
| ; CHECK-NEXT: ret; | ||
| %a = fptosi double %x to i8 | ||
| ret i8 %a | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is now way more complicated and difficult to follow. Should have something directly return the opcode to prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the hook to return an opcode.