-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[NVPTX] Add intrinsics for the szext instruction #139126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -568,6 +568,99 @@ to left-shift the found bit into the most-significant bit position, otherwise | |
| the result is the shift amount needed to right-shift the found bit into the | ||
| least-significant bit position. 0xffffffff is returned if no 1 bit is found. | ||
|
|
||
| '``llvm.nvvm.zext.inreg.clamp``' Intrinsic | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| Syntax: | ||
| """"""" | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare i32 @llvm.nvvm.zext.inreg.clamp(i32 %a, i32 %b) | ||
|
|
||
| Overview: | ||
| """"""""" | ||
|
|
||
| The '``llvm.nvvm.zext.inreg.clamp``' intrinsic extracts the low bits of the | ||
| input value, and zero-extends them back to the original width. | ||
|
|
||
| Semantics: | ||
| """""""""" | ||
|
|
||
| The '``llvm.nvvm.zext.inreg.clamp``' returns the zero-extension of N lowest bits | ||
| of operand %a. N is the value of operand %b clamped to the range [0, 32]. If N | ||
| is 0, the result is 0. | ||
|
|
||
| '``llvm.nvvm.zext.inreg.wrap``' Intrinsic | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| Syntax: | ||
| """"""" | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare i32 @llvm.nvvm.zext.inreg.wrap(i32 %a, i32 %b) | ||
|
|
||
| Overview: | ||
| """"""""" | ||
|
|
||
| The '``llvm.nvvm.zext.inreg.wrap``' intrinsic extracts the low bits of the | ||
| input value, and zero-extends them back to the original width. | ||
|
|
||
| Semantics: | ||
| """""""""" | ||
|
|
||
| The '``llvm.nvvm.zext.inreg.wrap``' returns the zero-extension of N lowest bits | ||
| of operand %a. N is the value of operand %b modulo 32. If N is 0, the result | ||
| is 0. | ||
|
|
||
| '``llvm.nvvm.sext.inreg.clamp``' Intrinsic | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| Syntax: | ||
| """"""" | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare i32 @llvm.nvvm.sext.inreg.clamp(i32 %a, i32 %b) | ||
|
|
||
| Overview: | ||
| """"""""" | ||
|
|
||
| The '``llvm.nvvm.sext.inreg.clamp``' intrinsic extracts the low bits of the | ||
| input value, and sign-extends them back to the original width. | ||
|
|
||
| Semantics: | ||
| """""""""" | ||
|
|
||
| The '``llvm.nvvm.sext.inreg.clamp``' returns the sign-extension of N lowest bits | ||
| of operand %a. N is the value of operand %b clamped to the range [0, 32]. If N | ||
| is 0, the result is 0. | ||
|
|
||
|
|
||
| '``llvm.nvvm.sext.inreg.wrap``' Intrinsic | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| Syntax: | ||
| """"""" | ||
|
|
||
| .. code-block:: llvm | ||
|
|
||
| declare i32 @llvm.nvvm.sext.inreg.wrap(i32 %a, i32 %b) | ||
|
|
||
| Overview: | ||
| """"""""" | ||
|
|
||
| The '``llvm.nvvm.sext.inreg.wrap``' intrinsic extracts the low bits of the | ||
| input value, and sign-extends them back to the original width. | ||
|
|
||
| Semantics: | ||
| """""""""" | ||
|
|
||
| The '``llvm.nvvm.sext.inreg.wrap``' returns the sign-extension of N lowest bits | ||
| of operand %a. N is the value of operand %b modulo 32. If N is 0, the result | ||
| is 0. | ||
|
||
|
|
||
| TMA family of Intrinsics | ||
| ------------------------ | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
| ; RUN: llc -o - < %s -mcpu=sm_70 -mattr=+ptx76 | FileCheck %s | ||
|
|
||
| target triple = "nvptx-unknown-cuda" | ||
|
|
||
| define i32 @szext_wrap_u32(i32 %a, i32 %b) { | ||
| ; CHECK-LABEL: szext_wrap_u32( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b32 %r<4>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.u32 %r1, [szext_wrap_u32_param_0]; | ||
| ; CHECK-NEXT: ld.param.u32 %r2, [szext_wrap_u32_param_1]; | ||
| ; CHECK-NEXT: szext.wrap.u32 %r3, %r1, %r2; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r3; | ||
| ; CHECK-NEXT: ret; | ||
| %c = call i32 @llvm.nvvm.zext.inreg.wrap(i32 %a, i32 %b) | ||
|
||
| ret i32 %c | ||
| } | ||
|
|
||
| define i32 @szext_clamp_u32(i32 %a, i32 %b) { | ||
| ; CHECK-LABEL: szext_clamp_u32( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b32 %r<4>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.u32 %r1, [szext_clamp_u32_param_0]; | ||
| ; CHECK-NEXT: ld.param.u32 %r2, [szext_clamp_u32_param_1]; | ||
| ; CHECK-NEXT: szext.clamp.u32 %r3, %r1, %r2; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r3; | ||
| ; CHECK-NEXT: ret; | ||
| %c = call i32 @llvm.nvvm.zext.inreg.clamp(i32 %a, i32 %b) | ||
| ret i32 %c | ||
| } | ||
|
|
||
| define i32 @szext_wrap_s32(i32 %a, i32 %b) { | ||
| ; CHECK-LABEL: szext_wrap_s32( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b32 %r<4>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.u32 %r1, [szext_wrap_s32_param_0]; | ||
| ; CHECK-NEXT: ld.param.u32 %r2, [szext_wrap_s32_param_1]; | ||
| ; CHECK-NEXT: szext.wrap.s32 %r3, %r1, %r2; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r3; | ||
| ; CHECK-NEXT: ret; | ||
| %c = call i32 @llvm.nvvm.sext.inreg.wrap(i32 %a, i32 %b) | ||
| ret i32 %c | ||
| } | ||
|
|
||
| define i32 @szext_clamp_s32(i32 %a, i32 %b) { | ||
| ; CHECK-LABEL: szext_clamp_s32( | ||
| ; CHECK: { | ||
| ; CHECK-NEXT: .reg .b32 %r<4>; | ||
| ; CHECK-EMPTY: | ||
| ; CHECK-NEXT: // %bb.0: | ||
| ; CHECK-NEXT: ld.param.u32 %r1, [szext_clamp_s32_param_0]; | ||
| ; CHECK-NEXT: ld.param.u32 %r2, [szext_clamp_s32_param_1]; | ||
| ; CHECK-NEXT: szext.clamp.s32 %r3, %r1, %r2; | ||
| ; CHECK-NEXT: st.param.b32 [func_retval0], %r3; | ||
| ; CHECK-NEXT: ret; | ||
| %c = call i32 @llvm.nvvm.sext.inreg.clamp(i32 %a, i32 %b) | ||
| ret i32 %c | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the
inregpart?It sounds like an implementation detail at best, and misleading, at worst. I.e. I'd assume that it implies in-place (i.e. in the same register) conversion, which is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove it. The intent was to match the convention of the
ISD::SEXT_INREGnode which performs an operation similar to this if%bwere a constant.