Skip to content

Commit cf8cdcd

Browse files
committed
ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <[email protected]>
1 parent 92a17ed commit cf8cdcd

File tree

4 files changed

+71
-18
lines changed

4 files changed

+71
-18
lines changed

.github/ISSUE_TEMPLATE/010-bug-compilation.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ body:
4040
attributes:
4141
label: GGML backends
4242
description: Which GGML backends do you know to be affected?
43-
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
43+
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL, zDNN]
4444
multiple: true
4545
validations:
4646
required: true

.github/ISSUE_TEMPLATE/011-bug-results.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ body:
4242
attributes:
4343
label: GGML backends
4444
description: Which GGML backends do you know to be affected?
45-
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
45+
options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL, zDNN]
4646
multiple: true
4747
validations:
4848
required: true

.github/labeler.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,11 @@ Vulkan:
2222
- any-glob-to-any-file:
2323
- ggml/include/ggml-vulkan.h
2424
- ggml/src/ggml-vulkan/**
25+
IBM zDNN:
26+
- changed-files:
27+
- any-glob-to-any-file:
28+
- ggml/include/ggml-zdnn.h
29+
- ggml/src/ggml-zdnn/**
2530
documentation:
2631
- changed-files:
2732
- any-glob-to-any-file:

docs/build-s390x.md

Lines changed: 64 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,14 @@ cmake --build build --config Release -j $(nproc)
4242
cmake --build build --config Release -j $(nproc)
4343
```
4444

45-
- By default, NNPA is enabled when available. To disable it (not recommended):
45+
- By default, NNPA is disabled by default. To enable it:
4646

4747
```bash
4848
cmake -S . -B build \
4949
-DCMAKE_BUILD_TYPE=Release \
5050
-DGGML_BLAS=ON \
5151
-DGGML_BLAS_VENDOR=OpenBLAS \
52-
-DGGML_NNPA=OFF
52+
-DGGML_NNPA=ON
5353
5454
cmake --build build --config Release -j $(nproc)
5555
```
@@ -76,6 +76,23 @@ cmake --build build --config Release -j $(nproc)
7676
cmake --build build --config Release -j $(nproc)
7777
```
7878

79+
## IBM zDNN Accelerator
80+
81+
This provides acceleration using the IBM zAIU co-processor located in the Telum I and Telum II processors. Make sure to have the [IBM zDNN library](https://github.com/IBM/zDNN) installed.
82+
83+
#### Compile from source from IBM
84+
85+
You may find the official build instructions here: [Building and Installing zDNN](https://github.com/IBM/zDNN?tab=readme-ov-file#building-and-installing-zdnn)
86+
87+
### Compilation
88+
89+
```bash
90+
cmake -S . -B build \
91+
-DCMAKE_BUILD_TYPE=Release \
92+
-DGGML_ZDNN=ON
93+
cmake --build build --config Release -j$(nproc)
94+
```
95+
7996
## Getting GGUF Models
8097

8198
All models need to be converted to Big-Endian. You can achieve this in three cases:
@@ -84,16 +101,24 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
84101

85102
![File Type - gguf](https://img.shields.io/badge/File_Type-gguf-fff)
86103

87-
You can find popular models pre-converted and verified at [s390x Ready Models](https://huggingface.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
104+
You can find popular models pre-converted and verified at [s390x Verified Models](https://huggingface.co/collections/taronaeo/s390x-verified-models-672765393af438d0ccb72a08) or [s390x Runnable Models](https://huggingface.co/collections/taronaeo/s390x-runnable-models-686e951824198df12416017e).
88105

89-
These models have already been converted from `safetensors` to `GGUF Big-Endian` and their respective tokenizers verified to run correctly on IBM z15 and later system.
106+
These models have already been converted from `safetensors` to `GGUF` Big-Endian and their respective tokenizers verified to run correctly on IBM z15 and later system.
90107

91108
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
92109

93110
![File Type - safetensors](https://img.shields.io/badge/File_Type-safetensors-da1e28)
94111

95112
The model you are trying to convert must be in `safetensors` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)). Make sure you have downloaded the model repository for this case.
96113

114+
Ensure that you have installed the required packages in advance
115+
116+
```bash
117+
pip3 install -r requirements.txt
118+
```
119+
120+
Convert the `safetensors` model to `GGUF`
121+
97122
```bash
98123
python3 convert_hf_to_gguf.py \
99124
--outfile model-name-be.f16.gguf \
@@ -116,7 +141,7 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
116141

117142
![File Type - gguf](https://img.shields.io/badge/File_Type-gguf-fff)
118143

119-
The model you are trying to convert must be in `gguf` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
144+
The model you are trying to convert must be in `gguf` file format (for example [IBM Granite 3.3 2B GGUF](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
120145

121146
```bash
122147
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
@@ -137,19 +162,19 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
137162

138163
### 1. SIMD Acceleration
139164

140-
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
165+
Only available in IBM z15/LinuxONE 3 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
141166

142167
### 2. NNPA Vector Intrinsics Acceleration
143168

144-
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned on when available) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
169+
Only available in IBM z16/LinuxONE 4 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
145170

146-
### 3. zDNN Accelerator
171+
### 3. zDNN Accelerator (WIP)
147172

148-
_Only available in IBM z16 or later system. No direction at the moment._
173+
Only available in IBM z17/LinuxONE 5 or later system with the `-DGGML_ZDNN=ON` compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs will default back to CPU routines.
149174

150175
### 4. Spyre Accelerator
151176

152-
_No direction at the moment._
177+
_Only available with IBM z17 / LinuxONE 5 or later system. No support currently available._
153178

154179
## Performance Tuning
155180

@@ -189,6 +214,26 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
189214

190215
Answer: Please ensure that your GCC compiler is of minimum GCC 15.1.0 version, and have `binutils` updated to the latest version. If this does not fix the problem, kindly open an issue.
191216

217+
4. Failing to install the `sentencepiece` package using GCC 15+
218+
219+
Answer: The `sentencepiece` team are aware of this as seen in [this issue](https://github.com/google/sentencepiece/issues/1108).
220+
221+
As a temporary workaround, please run the installation command with the following environment variables.
222+
223+
```bash
224+
export CXXFLAGS="-include cstdint"
225+
```
226+
227+
For example,
228+
229+
```bash
230+
CXXFLAGS="-include cstdint" pip3 install -r requirements.txt
231+
```
232+
233+
5. `-DGGML_NNPA=ON` generates gibberish output
234+
235+
Answer: We are aware of this as detailed in [this issue](https://github.com/ggml-org/llama.cpp/issues/14877). Please either try reducing the number of threads, or disable the compile option using `-DGGML_NNPA=OFF`.
236+
192237
## Getting Help on IBM Z & LinuxONE
193238

194239
1. **Bugs, Feature Requests**
@@ -201,11 +246,12 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
201246

202247
## Appendix A: Hardware Support Matrix
203248

204-
| | Support | Minimum Compiler Version |
205-
| ------- | ------- | ------------------------ |
206-
| IBM z15 || |
207-
| IBM z16 || |
208-
| IBM z17 || GCC 15.1.0 |
249+
| | Support | Minimum Compiler Version |
250+
| -------- | ------- | ------------------------ |
251+
| IBM z15 || |
252+
| IBM z16 || |
253+
| IBM z17 || GCC 15.1.0 |
254+
| IBM zAIU || |
209255

210256
- ✅ - supported and verified to run as intended
211257
- 🚫 - unsupported, we are unlikely able to provide support
@@ -214,7 +260,7 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
214260

215261
| | VX/VXE/VXE2 | NNPA | zDNN | Spyre |
216262
| ---------- | ----------- | ---- | ---- | ----- |
217-
| FP32 ||| ||
263+
| FP32 ||| ||
218264
| FP16 |||||
219265
| BF16 | 🚫 | 🚫 |||
220266
| Q4_0 |||||
@@ -244,3 +290,5 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
244290
- ✅ - acceleration available
245291
- 🚫 - acceleration unavailable, will still run using scalar implementation
246292
- ❓ - acceleration unknown, please contribute if you can test it yourself
293+
294+
Last Updated by **Aaron Teo ([email protected])** on July 31, 2025.

0 commit comments

Comments
 (0)