Skip to content

Conversation

@Vandana2896
Copy link
Contributor

The current issue is PTX doesn't vectorise load and stores that can be vectorized.

We noticed that we were missing vectorization for sin, cos and power operations from LLVM resulting in lesser speedup. The reason is currently we don't generate any .ptr and .align attributes for kernel parameters in CUDA and the required alignment information is missing. This results in missing out on vectorization opportunities.
The change enables adding .align attribute for alignment information and .ptr attribute for kernel pointers in kernel parameters under the assumption that all kernel parameters pointers point to global memory space. This results in vectorization and boosting the speedup by ~2x.
.align is enabled only when the pointer has explicit alignment specifier.

PTX ISA doc - https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-parameter-attribute-ptr

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@Vandana2896 Vandana2896 changed the title Enable .ptr .global .align attributes for kernel attributes for CUDA [NVPTX] Enable .ptr .global .align attributes for kernel attributes for CUDA Jan 26, 2024
@Vandana2896
Copy link
Contributor Author

@Artem-B could you review this?

@Artem-B Artem-B changed the title [NVPTX] Enable .ptr .global .align attributes for kernel attributes for CUDA [NVPTX] Annotate CUDA kernel pointer arguments with .ptr .space .align attributes. Jan 26, 2024
@github-actions
Copy link

github-actions bot commented Feb 2, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Vandana2896
Copy link
Contributor Author

@Artem-B could you review this? Thanks.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, but there's a puzzle.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case -- do all other LLVM tests pass?

If they do, you're good to go.

// CUDA kernels assume that pointers are in global address space
// See:
// https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parameter-state-space
assert(addrSpace == 0 && "Invalid address space");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assertion is still (partially?) valid. The only case when we want to emit .global is if the pointer is generic or explicitly in the global AS.

; CHECK-LABEL: func_align
; CHECK: .param .u64 .ptr .global .align 16 func_align_param_0,
; CHECK: .param .u64 .ptr .global func_align_param_1,
; CHECK: .param .u32 .ptr .global func_align_param_2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still wrong. addrspace(3) is not a global pointer. It's a shared AS pointer.

Speaking of which, please also add a test for explicitly annotated const and global AS pointers.

@LewisCrawford
Copy link
Contributor

I've been asked to pick up the upstreaming of this patch, so I've created a new PR in #114874 to build on the existing commits here, and address the review feedback so far.

I'll close this PR, and we can continue the discussion over on #114874 to see whether the latest comments are suitably addressed now.

LewisCrawford added a commit that referenced this pull request Nov 15, 2024
…114874)

Emit .ptr, .address-space, and .align attributes for kernel
args in CUDA (previously handled only for OpenCL).

This allows for more vectorization opportunities if the PTX consumer
is able to know about the pointer alignments.

If no alignment is explicitly specified, .align 1 will be emitted
to match the LLVM IR semantics in this case.

PTX ISA doc -
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-parameter-attribute-ptr

This is a rework of the original patch proposed in #79646

---------

Co-authored-by: Vandana <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants