Skip to content

Trial support for thread-block clusters #2825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

kshyatt
Copy link
Member

@kshyatt kshyatt commented Jul 26, 2025

Trying to attack #1989

This should allow users to submit a clusters= kwarg at kernel launch, which will use cuLaunchKernelEx on supported hardware and throw an error if clusters > 1 on any dimension on non-supported hardware. There are some tests but probably more can be done. I did wrap the necessary PTX instructions for detecting the cluster index and size within a kernel as well.

@kshyatt kshyatt requested a review from maleadt July 26, 2025 15:08
@kshyatt kshyatt added enhancement New feature or request needs documentation Documentation is requested. cuda kernels Stuff about writing CUDA kernels. labels Jul 26, 2025
@kshyatt
Copy link
Member Author

kshyatt commented Jul 26, 2025

Still needs support for:

  • [] cluster.sync
  • [] barrier.arrive
  • [] barrier.wait
  • [] query_shared_rank
  • [] map_shared_rank

@kshyatt kshyatt added the needs changes Changes are needed. label Jul 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request needs changes Changes are needed. needs documentation Documentation is requested.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant