Skip to content

Request: support add perf testing of matrix multiply "accelerator units" in benchmark.. #29

@oscarbg

Description

@oscarbg

Hi,
seems after dot product (int8) support next "hot" feature could add is "matrix multiply accelerator" perf testing support..
aka tensor cores on nvidia etc...
searching on opencl.gpuinfo.org I see ARM and Intel extensions (not Nvidia nor AMD nor Qualcomm which is sad to see):
*cl_arm_matrix_multiply 0.4%
*cl_intel_subgroup_matrix_multiply_accumulate 2.2%
*cl_intel_subgroup_matrix_multiply_accumulate_tf32 0.5%
*cl_intel_subgroup_split_matrix_multiply_accumulate 1.6%

of intel the one to test seems "cl_intel_subgroup_matrix_multiply_accumulate" (as tf32 is the tensorfloat32 variant) supported on Intel ARC on Windows and Linux and also..

the ARM one is only supported on Pixel products which is sad:
https://opencl.gpuinfo.org/listreports.php?extension=cl_arm_matrix_multiply
more sad is that no ext seems published altough seems to be supported in ARM Compute library which is open source I think so maybe there is code to learn from..:
https://android.googlesource.com/platform/external/ComputeLibrary/+/refs/heads/android14-qpr1-s2-release%5E1..refs/heads/android14-qpr1-s2-release/

so in brief.. do you plan on investigating adding a cross vendor "matrix multiply accelerator units" test at least for Intel and ARM GPUs..

NOTE: vkpeak does the same with equivalent cooperative matrix ext, but there HW support is broader (AMD,NV,Intel, etc..)

thanks..

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions