-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Hi,
similar situation to M4.. i.e. CL runtime supporting cl_khr_integer_dot_product produces slower results
new Intel OpenCL runtime for CPU 2025.1 supports cl_khr_integer_dot_product! (https://www.intel.com/content/www/us/en/developer/articles/release-notes/opencl-runtime-release-notes.html)..
results on 7950x on 2025.1:
| INT8 compute 0.079 TIOPs/s (1/64) |
vs using older 2024 runtime not supporting it:
| INT8 compute 0.588 TIOPs/s (1/64) |
full results:
2025.1:
|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | AMD Ryzen 9 7950X 16-Core Processor |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 2025.19.3.0.17_230222 (Windows) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units | 32 at 0 MHz (16 cores, 0.000 TFLOPs/s) |
| Memory, Cache | 98026 MB RAM, 1024 KB global / 256 KB local |
| Buffer Limits | 98026 MB global, 128 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64 compute 1.100 TFLOPs/s (1/64) |
| FP32 compute 1.309 TFLOPs/s (1/64) |
| FP16 compute 0.244 TFLOPs/s (1/64) |
| INT64 compute 0.538 TIOPs/s (1/64) |
| INT32 compute 1.270 TIOPs/s (1/64) |
| INT16 compute 2.589 TIOPs/s (1/64) |
| INT8 compute 0.079 TIOPs/s (1/64) |
| Memory Bandwidth ( coalesced read ) 50.96 GB/s |
| Memory Bandwidth ( coalesced write) 27.70 GB/s |
| Memory Bandwidth (misaligned read ) 60.71 GB/s |
| Memory Bandwidth (misaligned write) 30.80 GB/s |
|-----------------------------------------------------------------------------|
2024.x:
|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | AMD Ryzen 9 7950X 16-Core Processor |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 2024.17.3.0.08_160000 (Windows) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units | 32 at 0 MHz (16 cores, 0.000 TFLOPs/s) |
| Memory, Cache | 98026 MB RAM, 1024 KB global / 32 KB local |
| Buffer Limits | 98026 MB global, 128 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64 compute 0.979 TFLOPs/s (1/64) |
| FP32 compute 1.175 TFLOPs/s (1/64) |
| FP16 compute not supported |
| INT64 compute 0.303 TIOPs/s (1/64) |
| INT32 compute 1.225 TIOPs/s (1/64) |
| INT16 compute 2.323 TIOPs/s (1/64) |
| INT8 compute 0.588 TIOPs/s (1/64) |
| Memory Bandwidth ( coalesced read ) 49.40 GB/s |
| Memory Bandwidth ( coalesced write) 27.05 GB/s |
| Memory Bandwidth (misaligned read ) 59.32 GB/s |
| Memory Bandwidth (misaligned write) 30.57 GB/s |
|-----------------------------------------------------------------------------|
Metadata
Metadata
Assignees
Labels
No labels