-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
Description
I sucessfully implemented a persistent SGEMM kernel in CuTeDSL for H100 using the Static scheduler.
However now I want to apply similar approach to dense_gemm.py but apply the pipeline abstraction.
I get
Mismatched elements: 8388608 / 16777216 (50.0%)
Greatest absolute difference: inf at index (0, 9, 0) (up to 0.1 allowed)
Greatest relative difference: inf at index (0, 9, 0) (up to 0.001 allowed)
Which looks to me like there is something wrong with the way I write out the accumulators.
Could someone take a look and help me what is wrong?
https://gist.github.com/simveit/37b62b563e17ef47785c7fbb07cf8586
I should add that I launch with --tile_shape_mnk 128,256,64 to have appropriate atom size.
Reactions are currently unavailable