Skip to content

SYCL: Tuning trees for batch GEMM#75

Draft
nbeams wants to merge 2 commits intodpcpp-port-rebasefrom
sycl-batch-gemm-tuning-trees
Draft

SYCL: Tuning trees for batch GEMM#75
nbeams wants to merge 2 commits intodpcpp-port-rebasefrom
sycl-batch-gemm-tuning-trees

Conversation

@nbeams
Copy link
Contributor

@nbeams nbeams commented Dec 11, 2025

WIP. A place to test and discuss ideas for implementing decision trees for tuning (starting with batch GEMM and GEMV).

The current decision tree structure requires 4 arrays, like the ones used by the trees in scikit-learn. I modified one of the arrays slightly, as described in the comment documentation for evaluate_gemm_tree, but the others can just be output directly from the scikit-learn tree.

To begin discussion, I added an example for a new *gemm_batched_core setup for Z and C (I realized I'll probably have to keep changing things as we develop this, including maybe the number of configurations we want to instantiate for each precision+transpose+transpose combo, and want to limit the amount of times I have to change everything for all 4 precisions).

One problem with this is if we want to instantiate different sets of kernels for different architectures: we might end up with a lot of instantiations unless, e.g., we can have compile-time guards (with GPU_TARGET maybe?, which we don't currently do anything with when building for SYCL). Even just for PVC, we may want to cut down from what I have here, since there are 40 configurations for each precision, plus the various conjugate options for the complex types...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant