-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[NVPTX] Annotate CUDA kernel pointer arguments with .ptr .space .align attributes. #79646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@Artem-B could you review this? |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Clang formatting reverted.
|
@Artem-B could you review this? Thanks. |
Artem-B
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, but there's a puzzle.
Artem-B
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
Artem-B
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case -- do all other LLVM tests pass?
If they do, you're good to go.
| // CUDA kernels assume that pointers are in global address space | ||
| // See: | ||
| // https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parameter-state-space | ||
| assert(addrSpace == 0 && "Invalid address space"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assertion is still (partially?) valid. The only case when we want to emit .global is if the pointer is generic or explicitly in the global AS.
| ; CHECK-LABEL: func_align | ||
| ; CHECK: .param .u64 .ptr .global .align 16 func_align_param_0, | ||
| ; CHECK: .param .u64 .ptr .global func_align_param_1, | ||
| ; CHECK: .param .u32 .ptr .global func_align_param_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still wrong. addrspace(3) is not a global pointer. It's a shared AS pointer.
Speaking of which, please also add a test for explicitly annotated const and global AS pointers.
|
I've been asked to pick up the upstreaming of this patch, so I've created a new PR in #114874 to build on the existing commits here, and address the review feedback so far. I'll close this PR, and we can continue the discussion over on #114874 to see whether the latest comments are suitably addressed now. |
…114874) Emit .ptr, .address-space, and .align attributes for kernel args in CUDA (previously handled only for OpenCL). This allows for more vectorization opportunities if the PTX consumer is able to know about the pointer alignments. If no alignment is explicitly specified, .align 1 will be emitted to match the LLVM IR semantics in this case. PTX ISA doc - https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-parameter-attribute-ptr This is a rework of the original patch proposed in #79646 --------- Co-authored-by: Vandana <[email protected]>
The current issue is PTX doesn't vectorise load and stores that can be vectorized.
We noticed that we were missing vectorization for sin, cos and power operations from LLVM resulting in lesser speedup. The reason is currently we don't generate any .ptr and .align attributes for kernel parameters in CUDA and the required alignment information is missing. This results in missing out on vectorization opportunities.
The change enables adding .align attribute for alignment information and .ptr attribute for kernel pointers in kernel parameters under the assumption that all kernel parameters pointers point to global memory space. This results in vectorization and boosting the speedup by ~2x.
.align is enabled only when the pointer has explicit alignment specifier.
PTX ISA doc - https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-parameter-attribute-ptr