-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Hi,
seems after dot product (int8) support next "hot" feature could add is "matrix multiply accelerator" perf testing support..
aka tensor cores on nvidia etc...
searching on opencl.gpuinfo.org I see ARM and Intel extensions (not Nvidia nor AMD nor Qualcomm which is sad to see):
*cl_arm_matrix_multiply 0.4%
*cl_intel_subgroup_matrix_multiply_accumulate 2.2%
*cl_intel_subgroup_matrix_multiply_accumulate_tf32 0.5%
*cl_intel_subgroup_split_matrix_multiply_accumulate 1.6%
of intel the one to test seems "cl_intel_subgroup_matrix_multiply_accumulate" (as tf32 is the tensorfloat32 variant) supported on Intel ARC on Windows and Linux and also..
the ARM one is only supported on Pixel products which is sad:
https://opencl.gpuinfo.org/listreports.php?extension=cl_arm_matrix_multiply
more sad is that no ext seems published altough seems to be supported in ARM Compute library which is open source I think so maybe there is code to learn from..:
https://android.googlesource.com/platform/external/ComputeLibrary/+/refs/heads/android14-qpr1-s2-release%5E1..refs/heads/android14-qpr1-s2-release/
so in brief.. do you plan on investigating adding a cross vendor "matrix multiply accelerator units" test at least for Intel and ARM GPUs..
NOTE: vkpeak does the same with equivalent cooperative matrix ext, but there HW support is broader (AMD,NV,Intel, etc..)
thanks..