Skip to content

[QST] How to Interpret Canonical Layout Shapes #2872

@ccccjunkang

Description

@ccccjunkang

Hi,

I am trying to understand the GmmaDescriptor layouts in SM90. The code comments describe a complex interleaved layout, for example: LayoutType::B128 : Swizzle<3,4,3> o smem_ptr o ((8,m),(T,2)):((8T,SBO),(1, T )) . This also mentioned in the SM90 ptx documentation for TMA (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-canonical-layouts)

However, when I use TMA(without cute) to load a tile with CU_TENSOR_MAP_SWIZZLE_128B and print the raw Shared Memory values, the data appears to be physically contiguous (Row/Col Major + Swizzle) and not interleaved by 8 rows.

So, how to Interpret Canonical Layout Shapes?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions