Skip to content

Commit 1f56190

Browse files
committed
docs: update s390x documentation + add faq
Signed-off-by: Aaron Teo <[email protected]>
1 parent 716301d commit 1f56190

File tree

1 file changed

+70
-2
lines changed

1 file changed

+70
-2
lines changed

β€Ždocs/build-s390x.mdβ€Ž

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ cd llama.cpp
1616

1717
## CPU Build with BLAS
1818

19-
Building llama.cpp with BLAS support is highly recommended as it has shown to provide performance improvements.
19+
Building llama.cpp with BLAS support is highly recommended as it has shown to provide performance improvements. Make sure to have OpenBLAS installed in your environment.
2020

2121
```bash
2222
cmake -S . -B build \
@@ -82,12 +82,18 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
8282

8383
1. **Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)**
8484

85+
![File Type - gguf](https://img.shields.io/badge/File_Type-gguf-fff)
86+
8587
You can find popular models pre-converted and verified at [s390x Ready Models](https://huggingface.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
8688

87-
These models and their respective tokenizers are verified to run correctly on IBM Z & LinuxONE.
89+
These models have already been converted from `safetensors` to `GGUF Big-Endian` and their respective tokenizers verified to run correctly on IBM z15 and later system.
8890

8991
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
9092

93+
![File Type - safetensors](https://img.shields.io/badge/File_Type-safetensors-da1e28)
94+
95+
The model you are trying to convert must be in `safetensors` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)). Make sure you have downloaded the model repository for this case.
96+
9197
```bash
9298
python3 convert_hf_to_gguf.py \
9399
--outfile model-name-be.f16.gguf \
@@ -108,6 +114,10 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
108114

109115
3. **Convert existing GGUF Little-Endian model to Big-Endian**
110116

117+
![File Type - gguf](https://img.shields.io/badge/File_Type-gguf-fff)
118+
119+
The model you are trying to convert must be in `gguf` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
120+
111121
```bash
112122
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
113123
```
@@ -163,6 +173,18 @@ It is strongly recommended to disable SMT via the kernel boot parameters as it n
163173
164174
IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongly recommended to use BLAS.
165175
176+
## Frequently Asked Questions (FAQ)
177+
178+
1. I'm getting the following error message while trying to load a model: `gguf_init_from_file_impl: failed to load model: this GGUF file version 50331648 is extremely large, is there a mismatch between the host and model endianness?`
179+
180+
Answer: Please ensure that the model you have downloaded/converted is GGUFv3 Big-Endian. These models are usually denoted with the `-be` suffix, i.e., `granite-3.3-2b-instruct-be.F16.gguf`.
181+
182+
You may refer to the [Getting GGUF Models](#getting-gguf-models) section to manually convert a `safetensors` model to `GGUF` Big Endian.
183+
184+
2. I'm getting extremely poor performance when running inference on a model
185+
186+
Answer: Please refer to the [Appendix B: SIMD Support Matrix](#appendix-b-simd-support-matrix) to check if your model quantization is supported by SIMD acceleration.
187+
166188
## Getting Help on IBM Z & LinuxONE
167189
168190
1. **Bugs, Feature Requests**
@@ -172,3 +194,49 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
172194
2. **Other Questions**
173195
174196
Please reach out directly to [[email protected]](mailto:[email protected]).
197+
198+
## Appendix A: Hardware Support Matrix
199+
200+
| | Support | Minimum Compiler Version |
201+
| ------- | ------- | ------------------------ |
202+
| IBM z15 | βœ… | |
203+
| IBM z16 | βœ… | |
204+
| IBM z17 | βœ… | GCC 15.1.0 |
205+
206+
- βœ… - supported and verified to run as intended
207+
- 🚫 - unsupported, we are unlikely able to provide support
208+
209+
## Appendix B: SIMD Support Matrix
210+
211+
| | VX/VXE/VXE2 | NNPA | zDNN | Spyre |
212+
| ---------- | ----------- | ---- | ---- | ----- |
213+
| FP32 | βœ… | βœ… | ❓ | ❓ |
214+
| FP16 | βœ… | βœ… | ❓ | ❓ |
215+
| BF16 | 🚫 | 🚫 | ❓ | ❓ |
216+
| Q4_0 | βœ… | βœ… | ❓ | ❓ |
217+
| Q4_1 | βœ… | βœ… | ❓ | ❓ |
218+
| Q5_0 | 🚫 | 🚫 | ❓ | ❓ |
219+
| Q5_1 | 🚫 | 🚫 | ❓ | ❓ |
220+
| Q8_0 | βœ… | βœ… | ❓ | ❓ |
221+
| Q2_K | 🚫 | 🚫 | ❓ | ❓ |
222+
| Q3_K | βœ… | βœ… | ❓ | ❓ |
223+
| Q4_K | βœ… | βœ… | ❓ | ❓ |
224+
| Q5_K | βœ… | βœ… | ❓ | ❓ |
225+
| Q6_K | βœ… | βœ… | ❓ | ❓ |
226+
| TQ1_0 | 🚫 | 🚫 | ❓ | ❓ |
227+
| TQ2_0 | 🚫 | 🚫 | ❓ | ❓ |
228+
| IQ2_XXS | 🚫 | 🚫 | ❓ | ❓ |
229+
| IQ2_XS | 🚫 | 🚫 | ❓ | ❓ |
230+
| IQ2_S | 🚫 | 🚫 | ❓ | ❓ |
231+
| IQ3_XXS | 🚫 | 🚫 | ❓ | ❓ |
232+
| IQ3_S | 🚫 | 🚫 | ❓ | ❓ |
233+
| IQ1_S | 🚫 | 🚫 | ❓ | ❓ |
234+
| IQ1_M | 🚫 | 🚫 | ❓ | ❓ |
235+
| IQ4_NL | βœ… | βœ… | ❓ | ❓ |
236+
| IQ4_XS | βœ… | βœ… | ❓ | ❓ |
237+
| FP32->FP16 | 🚫 | βœ… | ❓ | ❓ |
238+
| FP16->FP32 | 🚫 | βœ… | ❓ | ❓ |
239+
240+
- βœ… - acceleration available
241+
- 🚫 - acceleration unavailable, will still run using scalar implementation
242+
- ❓ - acceleration unknown, please contribute if you can test it yourself

0 commit comments

Comments
Β (0)