You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All models need to be converted to Big-Endian. You can achieve this in three cases:
70
70
71
-
1. Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)
71
+
1. **Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)**
72
72
73
73
You can find popular models pre-converted and verified at [s390x Ready Models](hf.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
74
74
75
75
These models and their respective tokenizers are verified to run correctly on IBM Z & LinuxONE.
76
76
77
-
2. Convert safetensors model to GGUF Big-Endian directly (recommended)
77
+
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
78
78
79
79
```bash
80
80
python3 convert_hf_to_gguf.py \
@@ -94,7 +94,7 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
94
94
granite-3.3-2b-instruct/
95
95
```
96
96
97
-
3. Convert existing GGUF Little-Endian model to Big-Endian
97
+
3. **Convert existing GGUF Little-Endian model to Big-Endian**
98
98
99
99
```bash
100
100
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
@@ -109,5 +109,33 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
109
109
**Notes:**
110
110
- The GGUF endian conversion script may not support all data types at the moment and may fail for some models/quantizations. When that happens, please try manually converting the safetensors model to GGUF Big-Endian via Step 2.
111
111
112
+
## IBM zDNN Accelerator
112
113
114
+
*No direction at the moment.*
115
+
116
+
## IBM Spyre Accelerator
117
+
118
+
*No direction at the moment.*
119
+
120
+
## Performance Optimization
121
+
122
+
### Virtualization Setup
123
+
124
+
We strongly recommend using only LPAR (Type-1) virtualization to get the most performance.
125
+
126
+
Note: Type-2 virtualization is not supported at the moment, while you can get it running, the performance will not be the best.
127
+
128
+
### IFL (Core) Count
129
+
130
+
We recommend a minimum of 8 shared IFLs assigned to the LPAR. Increasing the IFL count past 8 shared IFLs will only improve Prompt Processing performance but not Token Generation.
131
+
132
+
Note: IFL count does not equate to vCPU count.
133
+
134
+
### SMT vs NOSMT (Simultaneous Multithreading)
135
+
136
+
We strongly recommend disabling SMT via the kernel boot parameters as it negatively affects performance. Please refer to your Linux distribution's guide on disabling SMT via kernel boot parameters.
137
+
138
+
### BLAS vs NOBLAS
139
+
140
+
We strongly recommend using BLAS for llama.cpp as there are no custom kernels for s390x for llama.cpp at the moment.
0 commit comments