Skip to content

Eliminate Sendbuffer #19

@Alex2804

Description

@Alex2804

Implement a version of the shuffle that sends directly tuple-by-tuple using nvshmem_put.
For this, we have to have a thread-level histogram to calculate thread-level write offsets on the destinaiton PEs.
We also need to make sure that the input data is in symmetric device memory. We can register it as symm. mem. if possible or allocate symm. dev. mem. and copy the data over.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions