Skip to content

7950x results: with Intel OCL runtime with dot product ext supp. way slower than without it.. #27

@oscarbg

Description

@oscarbg

Hi,
similar situation to M4.. i.e. CL runtime supporting cl_khr_integer_dot_product produces slower results
new Intel OpenCL runtime for CPU 2025.1 supports cl_khr_integer_dot_product! (https://www.intel.com/content/www/us/en/developer/articles/release-notes/opencl-runtime-release-notes.html)..
results on 7950x on 2025.1:

| INT8 compute 0.079 TIOPs/s (1/64) |

vs using older 2024 runtime not supporting it:

| INT8 compute 0.588 TIOPs/s (1/64) |

full results:
2025.1:

|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | AMD Ryzen 9 7950X 16-Core Processor                        |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 2025.19.3.0.17_230222 (Windows)                            |
| OpenCL Version | OpenCL C 3.0                                               |
| Compute Units  | 32 at 0 MHz (16 cores, 0.000 TFLOPs/s)                     |
| Memory, Cache  | 98026 MB RAM, 1024 KB global / 256 KB local                |
| Buffer Limits  | 98026 MB global, 128 KB constant                           |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                         1.100 TFLOPs/s (1/64) |
| FP32  compute                                         1.309 TFLOPs/s (1/64) |
| FP16  compute                                         0.244 TFLOPs/s (1/64) |
| INT64 compute                                         0.538  TIOPs/s (1/64) |
| INT32 compute                                         1.270  TIOPs/s (1/64) |
| INT16 compute                                         2.589  TIOPs/s (1/64) |
| INT8  compute                                         0.079  TIOPs/s (1/64) |
| Memory Bandwidth ( coalesced read      )                         50.96 GB/s |
| Memory Bandwidth ( coalesced      write)                         27.70 GB/s |
| Memory Bandwidth (misaligned read      )                         60.71 GB/s |
| Memory Bandwidth (misaligned      write)                         30.80 GB/s |
|-----------------------------------------------------------------------------|

2024.x:

|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | AMD Ryzen 9 7950X 16-Core Processor                        |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 2024.17.3.0.08_160000 (Windows)                            |
| OpenCL Version | OpenCL C 3.0                                               |
| Compute Units  | 32 at 0 MHz (16 cores, 0.000 TFLOPs/s)                     |
| Memory, Cache  | 98026 MB RAM, 1024 KB global / 32 KB local                 |
| Buffer Limits  | 98026 MB global, 128 KB constant                           |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                         0.979 TFLOPs/s (1/64) |
| FP32  compute                                         1.175 TFLOPs/s (1/64) |
| FP16  compute                                          not supported        |
| INT64 compute                                         0.303  TIOPs/s (1/64) |
| INT32 compute                                         1.225  TIOPs/s (1/64) |
| INT16 compute                                         2.323  TIOPs/s (1/64) |
| INT8  compute                                         0.588  TIOPs/s (1/64) |
| Memory Bandwidth ( coalesced read      )                         49.40 GB/s |
| Memory Bandwidth ( coalesced      write)                         27.05 GB/s |
| Memory Bandwidth (misaligned read      )                         59.32 GB/s |
| Memory Bandwidth (misaligned      write)                         30.57 GB/s |
|-----------------------------------------------------------------------------|

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions