-
Notifications
You must be signed in to change notification settings - Fork 111
Open
Labels
documentationreviewedMark with this label when issue has been discussed by teamMark with this label when issue has been discussed by team
Description
I found a couple things while looking at the transpose tutorial.
First, the launch and kernel solutions could use block_unchecked policies. This will also allow the kernel implementation to skip the second sync threads call.
Second, it doesn't look like the launch solution actually uses shared memory as intended. It looks like the same thread that reads a value writes that value. The intention of shared memory is to let different threads read and write so memory accesses to both matrices are coalesced. This will require the launch solution to have a teamSync call, which it is currently lacking.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationreviewedMark with this label when issue has been discussed by teamMark with this label when issue has been discussed by team