-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
Description
Hi,
I am trying to understand the GmmaDescriptor layouts in SM90. The code comments describe a complex interleaved layout, for example: LayoutType::B128 : Swizzle<3,4,3> o smem_ptr o ((8,m),(T,2)):((8T,SBO),(1, T )) . This also mentioned in the SM90 ptx documentation for TMA (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-canonical-layouts)
However, when I use TMA(without cute) to load a tile with CU_TENSOR_MAP_SWIZZLE_128B and print the raw Shared Memory values, the data appears to be physically contiguous (Row/Col Major + Swizzle) and not interleaved by 8 rows.
So, how to Interpret Canonical Layout Shapes?
Thanks!
Reactions are currently unavailable