Skip to content

Conversation

kshyatt
Copy link
Member

@kshyatt kshyatt commented Jul 26, 2025

Trying to attack #1989

This should allow users to submit a clusters= kwarg at kernel launch, which will use cuLaunchKernelEx on supported hardware and throw an error if clusters > 1 on any dimension on non-supported hardware. There are some tests but probably more can be done. I did wrap the necessary PTX instructions for detecting the cluster index and size within a kernel as well.

@kshyatt kshyatt requested a review from maleadt July 26, 2025 15:08
@kshyatt kshyatt added enhancement New feature or request needs documentation Documentation is requested. cuda kernels Stuff about writing CUDA kernels. labels Jul 26, 2025
@kshyatt
Copy link
Member Author

kshyatt commented Jul 26, 2025

Still needs support for:

  • [] cluster.sync
  • [] barrier.arrive
  • [] barrier.wait
  • [] query_shared_rank
  • [] map_shared_rank

@kshyatt kshyatt added the needs changes Changes are needed. label Jul 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request needs changes Changes are needed. needs documentation Documentation is requested.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant