Skip to content

[QST] [CuTeDSL] Persistent dense gemm on Hopper #2501

@simveit

Description

@simveit

I sucessfully implemented a persistent SGEMM kernel in CuTeDSL for H100 using the Static scheduler.

However now I want to apply similar approach to dense_gemm.py but apply the pipeline abstraction.
I get

Mismatched elements: 8388608 / 16777216 (50.0%)
Greatest absolute difference: inf at index (0, 9, 0) (up to 0.1 allowed)
Greatest relative difference: inf at index (0, 9, 0) (up to 0.001 allowed)

Which looks to me like there is something wrong with the way I write out the accumulators.
Could someone take a look and help me what is wrong?
https://gist.github.com/simveit/37b62b563e17ef47785c7fbb07cf8586

I should add that I launch with --tile_shape_mnk 128,256,64 to have appropriate atom size.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions