[QST] How to Interpret Canonical Layout Shapes

Hi,

I am trying to understand the GmmaDescriptor layouts in SM90. The code comments describe a complex interleaved layout, for example: LayoutType::B128 : Swizzle<3,4,3> o smem_ptr o ((8,m),(T,2)):((8T,SBO),(1, T )) . This also mentioned in the SM90 ptx documentation for TMA (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-canonical-layouts)

However, when I use TMA(without cute) to load a tile with CU_TENSOR_MAP_SWIZZLE_128B and print the raw Shared Memory values, the data appears to be physically contiguous (Row/Col Major + Swizzle) and not interleaved by 8 rows.  


So, how to Interpret Canonical Layout Shapes?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] How to Interpret Canonical Layout Shapes #2872

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] How to Interpret Canonical Layout Shapes #2872

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions