Skip to content

Commit d3903f5

Browse files
committed
docs: add s390x model conversion steps
Signed-off-by: Aaron Teo <[email protected]>
1 parent 372f662 commit d3903f5

File tree

1 file changed

+64
-3
lines changed

1 file changed

+64
-3
lines changed

docs/build-s390x.md

Lines changed: 64 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
> [!IMPORTANT]
2+
> This build documentation is specific only to IBM Z & LinuxONE mainframes (s390x). You can find the build documentation for other architectures: [build.md](docs/build.md).
3+
14
# Build llama.cpp locally (for s390x)
25

36
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
@@ -26,12 +29,24 @@ cmake --build build --config Release -j $(nproc)
2629

2730
**Notes**:
2831
- For faster repeated compilation, install [ccache](https://ccache.dev/)
32+
- By default, VXE/VXE2 is enabled. To disable it (not recommended):
33+
34+
```bash
35+
cmake -S . -B build \
36+
-DCMAKE_BUILD_TYPE=Release \
37+
-DGGML_BLAS=ON \
38+
-DGGML_BLAS_VENDOR=OpenBLAS \
39+
-DGGML_VXE=OFF
40+
41+
cmake --build build --config Release -j $(nproc)
42+
```
43+
2944
- For debug builds:
3045

3146
```bash
32-
cmake -S . -B build \
33-
-DCMAKE_BUILD_TYPE=Debug \
34-
-DGGML_BLAS=ON \
47+
cmake -S . -B build \
48+
-DCMAKE_BUILD_TYPE=Debug \
49+
-DGGML_BLAS=ON \
3550
-DGGML_BLAS_VENDOR=OpenBLAS
3651
3752
cmake --build build --config Debug -j $(nproc)
@@ -49,4 +64,50 @@ cmake --build build --config Release -j $(nproc)
4964
cmake --build build --config Release -j $(nproc)
5065
```
5166

67+
## Getting GGUF Models
68+
69+
In order to run GGUF models, the model needs to be converted to Big-Endian. You can achieve this in three cases:
70+
71+
1. Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)
72+
73+
You can find popular models pre-converted and verified at [s390x Ready Models](hf.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
74+
75+
These models and their respective tokenizers are verified to run correctly on IBM Z & LinuxONE.
76+
77+
2. Convert safetensors model to GGUF Big-Endian directly (recommended)
78+
79+
```bash
80+
python3 convert_hf_to_gguf.py \
81+
--outfile model-name-be.f16.gguf \
82+
--outtype f16 \
83+
--bigendian \
84+
model-directory/
85+
```
86+
87+
For example,
88+
89+
```bash
90+
python3 convert_hf_to_gguf.py \
91+
--outfile granite-3.3-2b-instruct-be.f16.gguf \
92+
--outtype f16 \
93+
--bigendian \
94+
granite-3.3-2b-instruct/
95+
```
96+
97+
3. Convert existing GGUF Little-Endian model to Big-Endian
98+
99+
```bash
100+
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
101+
```
102+
103+
For example,
104+
```bash
105+
python3 gguf-py/gguf/scripts/gguf_convert_endian.py granite-3.3-2b-instruct-le.f16.gguf BIG
106+
mv granite-3.3-2b-instruct-le.f16.gguf granite-3.3-2b-instruct-be.f16.gguf
107+
```
108+
109+
**Notes:**
110+
- The GGUF endian conversion script may not support all data types at the moment and may fail for some models/quantizations. When that happens, please try manually converting the safetensors model to GGUF Big-Endian via Step 2.
111+
112+
52113

0 commit comments

Comments
 (0)