You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This patch adds a new variant of TMA Bulk Copy
intrinsics introduced in sm100+. This variant
has an additional byte_mask to select the bytes
for the copy operation.
* Selection is all done through table-gen now.
So, this patch removes the corresponding
SelectCpAsyncBulkS2G() function.
* lit tests are verified with a cuda-12.8 ptxas
executable.
PTX Spec link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-bulk-copy
Signed-off-by: Durgadoss R <[email protected]>
0 commit comments