SYCL: Tuning trees for batch GEMM by nbeams · Pull Request #75 · icl-utk-edu/magma

nbeams · 2025-12-11T19:43:37Z

WIP. A place to test and discuss ideas for implementing decision trees for tuning (starting with batch GEMM and GEMV).

The current decision tree structure requires 4 arrays, like the ones used by the trees in scikit-learn. I modified one of the arrays slightly, as described in the comment documentation for evaluate_gemm_tree, but the others can just be output directly from the scikit-learn tree.

To begin discussion, I added an example for a new *gemm_batched_core setup for Z and C (I realized I'll probably have to keep changing things as we develop this, including maybe the number of configurations we want to instantiate for each precision+transpose+transpose combo, and want to limit the amount of times I have to change everything for all 4 precisions).

One problem with this is if we want to instantiate different sets of kernels for different architectures: we might end up with a lot of instantiations unless, e.g., we can have compile-time guards (with GPU_TARGET maybe?, which we don't currently do anything with when building for SYCL). Even just for PVC, we may want to cut down from what I have here, since there are 40 configurations for each precision, plus the various conjugate options for the complex types...

nbeams added 2 commits December 11, 2025 18:21

WIP: a tuning tree possibility (batched zgemm, cgemm)

7d55bde

WIP: add an initial dgemm and sgemm batched config

5d6f79d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL: Tuning trees for batch GEMM#75

SYCL: Tuning trees for batch GEMM#75
nbeams wants to merge 2 commits intodpcpp-port-rebasefrom
sycl-batch-gemm-tuning-trees

nbeams commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nbeams commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant