Skip to content

Commit 2ee2e0b

Browse files
committed
docs: s390x add accelerator and perf optimizations
Signed-off-by: Aaron Teo <[email protected]>
1 parent f14c829 commit 2ee2e0b

File tree

1 file changed

+31
-3
lines changed

1 file changed

+31
-3
lines changed

docs/build-s390x.md

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,13 +68,13 @@ cmake --build build --config Release -j $(nproc)
6868

6969
All models need to be converted to Big-Endian. You can achieve this in three cases:
7070

71-
1. Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)
71+
1. **Use pre-converted models verified for use on IBM Z & LinuxONE (easiest)**
7272

7373
You can find popular models pre-converted and verified at [s390x Ready Models](hf.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
7474

7575
These models and their respective tokenizers are verified to run correctly on IBM Z & LinuxONE.
7676

77-
2. Convert safetensors model to GGUF Big-Endian directly (recommended)
77+
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
7878

7979
```bash
8080
python3 convert_hf_to_gguf.py \
@@ -94,7 +94,7 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
9494
granite-3.3-2b-instruct/
9595
```
9696

97-
3. Convert existing GGUF Little-Endian model to Big-Endian
97+
3. **Convert existing GGUF Little-Endian model to Big-Endian**
9898

9999
```bash
100100
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
@@ -109,5 +109,33 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
109109
**Notes:**
110110
- The GGUF endian conversion script may not support all data types at the moment and may fail for some models/quantizations. When that happens, please try manually converting the safetensors model to GGUF Big-Endian via Step 2.
111111

112+
## IBM zDNN Accelerator
112113

114+
*No direction at the moment.*
115+
116+
## IBM Spyre Accelerator
117+
118+
*No direction at the moment.*
119+
120+
## Performance Optimization
121+
122+
### Virtualization Setup
123+
124+
We strongly recommend using only LPAR (Type-1) virtualization to get the most performance.
125+
126+
Note: Type-2 virtualization is not supported at the moment, while you can get it running, the performance will not be the best.
127+
128+
### IFL (Core) Count
129+
130+
We recommend a minimum of 8 shared IFLs assigned to the LPAR. Increasing the IFL count past 8 shared IFLs will only improve Prompt Processing performance but not Token Generation.
131+
132+
Note: IFL count does not equate to vCPU count.
133+
134+
### SMT vs NOSMT (Simultaneous Multithreading)
135+
136+
We strongly recommend disabling SMT via the kernel boot parameters as it negatively affects performance. Please refer to your Linux distribution's guide on disabling SMT via kernel boot parameters.
137+
138+
### BLAS vs NOBLAS
139+
140+
We strongly recommend using BLAS for llama.cpp as there are no custom kernels for s390x for llama.cpp at the moment.
113141

0 commit comments

Comments
 (0)