You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- By default, NNPA is enabled when available. To disable it (not recommended):
46
+
47
+
```bash
48
+
cmake -S . -B build \
49
+
-DCMAKE_BUILD_TYPE=Release \
50
+
-DGGML_BLAS=ON \
51
+
-DGGML_BLAS_VENDOR=OpenBLAS \
52
+
-DGGML_NNPA=OFF
53
+
54
+
cmake --build build --config Release -j $(nproc)
55
+
```
56
+
57
+
- For debug builds:
45
58
46
59
```bash
47
60
cmake -S . -B build \
48
61
-DCMAKE_BUILD_TYPE=Debug \
49
62
-DGGML_BLAS=ON \
50
63
-DGGML_BLAS_VENDOR=OpenBLAS
51
-
52
64
cmake --build build --config Debug -j $(nproc)
53
65
```
54
66
55
-
- For static builds, add `-DBUILD_SHARED_LIBS=OFF`:
67
+
- For static builds, add `-DBUILD_SHARED_LIBS=OFF`:
56
68
57
69
```bash
58
70
cmake -S . -B build \
@@ -70,7 +82,7 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
70
82
71
83
1. **Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)**
72
84
73
-
You can find popular models pre-converted and verified at [s390x Ready Models](hf.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
85
+
You can find popular models pre-converted and verified at [s390x Ready Models](https://huggingface.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
74
86
75
87
These models and their respective tokenizers are verified to run correctly on IBM Z & LinuxONE.
76
88
@@ -101,27 +113,33 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
101
113
```
102
114
103
115
For example,
116
+
104
117
```bash
105
118
python3 gguf-py/gguf/scripts/gguf_convert_endian.py granite-3.3-2b-instruct-le.f16.gguf BIG
- The GGUF endian conversion script may not support all data types at the moment and may fail for some models/quantizations. When that happens, please try manually converting the safetensors model to GGUF Big-Endian via Step 2.
111
125
112
126
## IBM Accelerators
113
127
114
128
### 1. SIMD Acceleration
115
129
116
-
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14 or EC13. In such systems, the APIs can still run but will use a scalar implementation.
130
+
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation.
131
+
132
+
### 2. NNPA Vector Intrinsics Acceleration
117
133
118
-
### 2. zDNN Accelerator
134
+
Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned on when available) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
119
135
120
-
*Only available in IBM z16 or later system. No direction at the moment.*
136
+
### 3. zDNN Accelerator
121
137
122
-
### 3. Spyre Accelerator
138
+
_Only available in IBM z16 or later system. No direction at the moment._
123
139
124
-
*No direction at the moment.*
140
+
### 4. Spyre Accelerator
141
+
142
+
_No direction at the moment._
125
143
126
144
## Performance Tuning
127
145
@@ -154,4 +172,3 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
0 commit comments