You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (ggml-org#12035)
* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <[email protected]>
Copy file name to clipboardExpand all lines: docs/backend/SYCL.md
+14-2Lines changed: 14 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,6 +43,16 @@ For CI and performance test summary, please refer to [llama.cpp CI for SYCL Back
43
43
44
44
## News
45
45
46
+
- 2025.2
47
+
- Optimize MUL_MAT Q4_0 on Intel GPU for all dGPUs and built-in GPUs since MTL. Increase the performance of LLM (llama-2-7b.Q4_0.gguf) 21%-87% on Intel GPUs (MTL, ARL-H, Arc, Flex, PVC).
48
+
|GPU|Base tokens/s|Increased tokens/s|Percent|
49
+
|-|-|-|-|
50
+
|PVC 1550|39|73|+87%|
51
+
|Flex 170|39|50|+28%|
52
+
|Arc770|42|55|+30%|
53
+
|MTL|13|16|+23%|
54
+
|ARL-H|14|17|+21%|
55
+
46
56
- 2024.11
47
57
- Use syclcompat to improve the performance on some platforms. This requires to use oneAPI 2025.0 or newer.
| GGML_SYCL_DEBUG | 0 (default) or 1 | Enable log function by macro: GGML_SYCL_DEBUG |
710
+
| GGML_SYCL_DISABLE_OPT | 0 (default) or 1 | Disable optimize features based on Intel GPU type, to compare the performance increase |
700
711
| ZES_ENABLE_SYSMAN | 0 (default) or 1 | Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer |
701
712
| GGML_SYCL_VISIBLE_DEVICES|id1,id2,...|It's like `CUDA_VISIBLE_DEVICES`, define the SYCL device ID list to visible. Like "0", "0,2", "2,1" |
702
713
| ONEAPI_DEVICE_SELECTOR|Refer to [oneapi-device-selector](https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector)|be used to limit the choice of devices available when the SYCL-using application is run|
@@ -725,6 +736,7 @@ The parameters about device choose of llama.cpp works with SYCL backend rule to
0 commit comments