Commit a61e00b

committed

Refactoring of gemm, adding faster kernel

This change gets rid of all non-batch functors, modularizes duplicated code, and implement non-batches functions as calls to batched functors with trivial constexpr batch indexer. This change also adds faster gemm kernel that threads of N,M space, and accumulates entire range of K in single work-item. Dispatch logic changed too, we dispatch to thead-K kernel only if (n,m) space is sufficiently small.

1 parent e6d3564 commit a61e00bCopy full SHA for a61e00b

1 file changed

+2707

-4132

lines changed

dpctl/tensor/libtensor/include/kernels/linalg_functions
- gemm.hpp

1 file changed

+2707

-4132

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit a61e00b

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments