You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This provides acceleration using the IBM zAIU co-processor located in the Telum I and Telum II processors. Make sure to have the [IBM zDNN library](https://github.com/IBM/zDNN) installed.
82
+
83
+
#### Compile from source from IBM
84
+
85
+
You may find the official build instructions here: [Building and Installing zDNN](https://github.com/IBM/zDNN?tab=readme-ov-file#building-and-installing-zdnn)
86
+
87
+
### Compilation
88
+
89
+
```bash
90
+
cmake -S . -B build \
91
+
-DCMAKE_BUILD_TYPE=Release \
92
+
-DGGML_ZDNN=ON
93
+
cmake --build build --config Release -j$(nproc)
94
+
```
95
+
79
96
## Getting GGUF Models
80
97
81
98
All models need to be converted to Big-Endian. You can achieve this in three cases:
@@ -145,15 +162,15 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
145
162
146
163
### 1. SIMD Acceleration
147
164
148
-
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
165
+
Only available in IBM z15/LinuxONE 3 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
149
166
150
167
### 2. NNPA Vector Intrinsics Acceleration
151
168
152
-
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
169
+
Only available in IBM z16/LinuxONE 4 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
153
170
154
-
### 3. zDNN Accelerator
171
+
### 3. zDNN Accelerator (WIP)
155
172
156
-
_Only available in IBM z16 / LinuxONE 4 or later system. No support currently available._
173
+
Only available in IBM z17/LinuxONE 5 or later system with the `-DGGML_ZDNN=ON` compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs will default back to CPU routines.
157
174
158
175
### 4. Spyre Accelerator
159
176
@@ -229,11 +246,12 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
229
246
230
247
## Appendix A: Hardware Support Matrix
231
248
232
-
|| Support | Minimum Compiler Version |
233
-
| ------- | ------- | ------------------------ |
234
-
| IBM z15 | ✅ ||
235
-
| IBM z16 | ✅ ||
236
-
| IBM z17 | ✅ | GCC 15.1.0 |
249
+
|| Support | Minimum Compiler Version |
250
+
| -------- | ------- | ------------------------ |
251
+
| IBM z15 | ✅ ||
252
+
| IBM z16 | ✅ ||
253
+
| IBM z17 | ✅ | GCC 15.1.0 |
254
+
| IBM zDNN | ✅ ||
237
255
238
256
- ✅ - supported and verified to run as intended
239
257
- 🚫 - unsupported, we are unlikely able to provide support
@@ -242,7 +260,7 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
0 commit comments