You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This change gets rid of all non-batch functors, modularizes
duplicated code, and implement non-batches functions as calls
to batched functors with trivial constexpr batch indexer.
This change also adds faster gemm kernel that threads of N,M space,
and accumulates entire range of K in single work-item.
Dispatch logic changed too, we dispatch to thead-K kernel only if
(n,m) space is sufficiently small.
0 commit comments