Skip to content
Discussion options

You must be logged in to vote

Say we have a 16 (M) by 16 (N) tensor A and each element is a f32. And we want to do swizzling along the N dim (row).

We want to swizzle the elements within each row when putting the elements in shared memory. Here is how the parameters control the swizzling behavior

  • Multiple consecutive rows can have the same swizzling pattern. The number of rows that have the same swizzling pattern is perPhase, which is calculated as perPhase = 128 / (elementsPerRow * elementTypeInBytes). In this example, perPhase = 128 / (16*4) = 2, which means every 2 rows have the same swizzling pattern
  • maxPhase means how many patterns in total do we want. This is usually set according to how shared memory is acces…

Replies: 6 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by chengjunlu
Comment options

You must be logged in to vote
1 reply
@zhanglx13
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants