Skip to content

🎉FA2/HGEMM SMEM Swizzle

Compare
Choose a tag to compare
@DefTruth DefTruth released this 25 Dec 05:52
· 212 commits to main since this release
bdd361a

What's Changed

flash_attn_mma_stages_split_q_tiling_qk_swizzle_kernel

void flash_attn_mma_stages_split_q_tiling_qk_swizzle_kernel<512, 16, 8, 16, 8, 1, 8, 1, 1, 16, 1, 64, 2, 0, 0, 8>(__half *, __half *, __half *, __half *, int, int) (8, 48, 1)x(256, 1, 1), Context 1, Stream 7, Device 0, CC 8.9
    Section: Command line profiler metrics
    ------------------------------------------------------------------ ----------- ------------
    Metric Name                                                        Metric Unit Metric Value
    ------------------------------------------------------------------ ----------- ------------
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.avg                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.max                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.min                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.sum                        0
    ------------------------------------------------------------------ ----------- ------------

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.11...v2.6.12