Skip to content

Commit ff27f80

Browse files
authored
ggml: initial IBM zDNN backend (#14975)
* ggml-zdnn: inital backend impl Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: temp change z17 to arch15 Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: fix build bugs Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: tensor->extra logging check Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add layout name mapping, ztensor information Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: separate logging into its own line Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add shape comparison Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add ggml_tensor shape log Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: fix incorrect shape logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add output buffer check Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: run compute and store into tensor->extra Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add set_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more loggers Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update set_tensor logging to check only for matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: last working matmul version Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add comments to prevent accidentally deleting lines Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: support op out_prod Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update op out_prod to use tensor->extra Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rewrite the backend implementation Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bugfix new impl Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix compiler warnings and bugfixes Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: test ztensor finding in init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: implement at least 1 op to test Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: assign tensor->extra to buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add check for view tensors to prevent init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rework init_tensor to create new buffers Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch to std vector instead of array Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch buffers back and set to arbitrary number Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: impl init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update supports_op matmul matrix Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix incorrect ztensor shape, reduce memory padding Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: impl matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix compiler error missing type Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing data transform call Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: tighten memory usage, change string allocation Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias ztensor and data free Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias data transform Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more debug info for extra buffer transform Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add logger to check if mat mul ops go through set_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: activate bias transform in matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move weights transform into mulmat Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more safeguards in matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix sequencing of transforms Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bugfix transform ztensor vs origtensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: figure out why sigtrap is happening Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix sigsegv Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move everything back to local declaration Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move bias data to local also Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bring back working matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rewrite into mre Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing vector import Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing vector import in header Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt to fix sigsegv Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing load tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix invalid ztensor buffer release Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add logging to debug free buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: remove free_buffer debug info Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add parmblkformat detections Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add nnpa installed detection Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add zdnn_init call for static libs Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at fixing invalid buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch to using deque to fix pointer deref problem Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add weights logging to check Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt to use unique ptr Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add tensor to pre_tfm_desc logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add inputs logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable op_none initialisation for testing Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing return from init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: load ztensors in cgraph exec Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: work on moving output ztensor as well Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable logging and breakpoints for full test Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at manually changing the layout Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at using default nwhc format instead Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable global load ztensor for now Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix errorenous output load tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add guards to prevent loading ztensor if transformed Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bring load ztensor back to init routine Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix ztensor deallocation abort stabilise ggml <-> zdnn api Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: clean up matmul selection Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: clean up project structure Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update documentation, prepare for upstream Signed-off-by: Aaron Teo <[email protected]> * chore: add codeowners Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable batched matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at fixing tensor views during matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: deny all view tensors directly Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix pr comments Signed-off-by: Aaron Teo <[email protected]> * docs: update ops docs for zdnn Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: redo test-backend-ops for ops.md Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix typo in build-s390x.md Signed-off-by: Aaron Teo <[email protected]> * codeowners: remove taronaeo for now Signed-off-by: Aaron Teo <[email protected]> * Revert "codeowners: remove taronaeo for now" This reverts commit 411ea4e. * ggml-zdnn: remove unused ggml_zdnn macro Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
1 parent d3248d9 commit ff27f80

File tree

15 files changed

+9265
-102
lines changed

15 files changed

+9265
-102
lines changed

.github/ISSUE_TEMPLATE/010-bug-compilation.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ body:
4040
attributes:
4141
label: GGML backends
4242
description: Which GGML backends do you know to be affected?
43-
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
43+
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL, zDNN]
4444
multiple: true
4545
validations:
4646
required: true

.github/ISSUE_TEMPLATE/011-bug-results.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ body:
4242
attributes:
4343
label: GGML backends
4444
description: Which GGML backends do you know to be affected?
45-
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
45+
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL, zDNN]
4646
multiple: true
4747
validations:
4848
required: true

.github/labeler.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,11 @@ Vulkan:
2222
- any-glob-to-any-file:
2323
- ggml/include/ggml-vulkan.h
2424
- ggml/src/ggml-vulkan/**
25+
IBM zDNN:
26+
- changed-files:
27+
- any-glob-to-any-file:
28+
- ggml/include/ggml-zdnn.h
29+
- ggml/src/ggml-zdnn/**
2530
documentation:
2631
- changed-files:
2732
- any-glob-to-any-file:

CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@
1010
/ggml/src/ggml-opt.cpp @JohannesGaessler
1111
/ggml/src/gguf.cpp @JohannesGaessler
1212
/ggml/src/ggml-vulkan/ @0cc4m
13+
/ggml/src/ggml-zdnn/ @taronaeo

docs/build-s390x.md

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,23 @@ cmake --build build --config Release -j $(nproc)
7676
cmake --build build --config Release -j $(nproc)
7777
```
7878

79+
## IBM zDNN Accelerator
80+
81+
This provides acceleration using the IBM zAIU co-processor located in the Telum I and Telum II processors. Make sure to have the [IBM zDNN library](https://github.com/IBM/zDNN) installed.
82+
83+
#### Compile from source from IBM
84+
85+
You may find the official build instructions here: [Building and Installing zDNN](https://github.com/IBM/zDNN?tab=readme-ov-file#building-and-installing-zdnn)
86+
87+
### Compilation
88+
89+
```bash
90+
cmake -S . -B build \
91+
-DCMAKE_BUILD_TYPE=Release \
92+
-DGGML_ZDNN=ON
93+
cmake --build build --config Release -j$(nproc)
94+
```
95+
7996
## Getting GGUF Models
8097

8198
All models need to be converted to Big-Endian. You can achieve this in three cases:
@@ -145,15 +162,15 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
145162

146163
### 1. SIMD Acceleration
147164

148-
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
165+
Only available in IBM z15/LinuxONE 3 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
149166

150167
### 2. NNPA Vector Intrinsics Acceleration
151168

152-
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
169+
Only available in IBM z16/LinuxONE 4 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
153170

154-
### 3. zDNN Accelerator
171+
### 3. zDNN Accelerator (WIP)
155172

156-
_Only available in IBM z16 / LinuxONE 4 or later system. No support currently available._
173+
Only available in IBM z17/LinuxONE 5 or later system with the `-DGGML_ZDNN=ON` compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs will default back to CPU routines.
157174

158175
### 4. Spyre Accelerator
159176

@@ -229,11 +246,12 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
229246

230247
## Appendix A: Hardware Support Matrix
231248

232-
| | Support | Minimum Compiler Version |
233-
| ------- | ------- | ------------------------ |
234-
| IBM z15 || |
235-
| IBM z16 || |
236-
| IBM z17 || GCC 15.1.0 |
249+
| | Support | Minimum Compiler Version |
250+
| -------- | ------- | ------------------------ |
251+
| IBM z15 || |
252+
| IBM z16 || |
253+
| IBM z17 || GCC 15.1.0 |
254+
| IBM zDNN || |
237255

238256
- ✅ - supported and verified to run as intended
239257
- 🚫 - unsupported, we are unlikely able to provide support
@@ -242,7 +260,7 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
242260

243261
| | VX/VXE/VXE2 | NNPA | zDNN | Spyre |
244262
| ---------- | ----------- | ---- | ---- | ----- |
245-
| FP32 ||| ||
263+
| FP32 ||| ||
246264
| FP16 |||||
247265
| BF16 | 🚫 | 🚫 |||
248266
| Q4_0 |||||
@@ -273,4 +291,4 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
273291
- 🚫 - acceleration unavailable, will still run using scalar implementation
274292
- ❓ - acceleration unknown, please contribute if you can test it yourself
275293

276-
Last Updated by **Aaron Teo ([email protected])** on July 25, 2025.
294+
Last Updated by **Aaron Teo ([email protected])** on July 31, 2025.

0 commit comments

Comments
 (0)