Skip to content

Commit 213ded6

Browse files
committed
docs: reword for s390x
Signed-off-by: Aaron Teo <[email protected]>
1 parent a275329 commit 213ded6

File tree

1 file changed

+11
-7
lines changed

1 file changed

+11
-7
lines changed

docs/build-s390x.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -111,35 +111,39 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
111111

112112
## IBM Accelerators
113113

114-
### 1. zDNN Accelerator
114+
### 1. SIMD Acceleration
115115

116-
*Only available in IBM z16 and onwards. No direction at the moment.*
116+
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default). No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14 or EC13. In such systems, the APIs can still run but will use a scalar implementation.
117117

118-
### 2. Spyre Accelerator
118+
### 2. zDNN Accelerator
119+
120+
*Only available in IBM z16 or later system. No direction at the moment.*
121+
122+
### 3. Spyre Accelerator
119123

120124
*No direction at the moment.*
121125

122126
## Performance Tuning
123127

124128
### 1. Virtualization Setup
125129

126-
We strongly recommend using only LPAR (Type-1) virtualization to get the most performance.
130+
It is strongly recommended to use only LPAR (Type-1) virtualization to get the most performance.
127131

128132
Note: Type-2 virtualization is not supported at the moment, while you can get it running, the performance will not be the best.
129133

130134
### 2. IFL (Core) Count
131135

132-
We recommend a minimum of 8 shared IFLs assigned to the LPAR. Increasing the IFL count past 8 shared IFLs will only improve Prompt Processing performance but not Token Generation.
136+
It is recommended to allocate a minimum of 8 shared IFLs assigned to the LPAR. Increasing the IFL count past 8 shared IFLs will only improve Prompt Processing performance but not Token Generation.
133137

134138
Note: IFL count does not equate to vCPU count.
135139

136140
### 3. SMT vs NOSMT (Simultaneous Multithreading)
137141

138-
We strongly recommend disabling SMT via the kernel boot parameters as it negatively affects performance. Please refer to your Linux distribution's guide on disabling SMT via kernel boot parameters.
142+
It is strongly recommended to disable SMT via the kernel boot parameters as it negatively affects performance. Please refer to your Linux distribution's guide on disabling SMT via kernel boot parameters.
139143
140144
### 4. BLAS vs NOBLAS
141145
142-
We strongly recommend using BLAS for llama.cpp as there are no custom kernels for s390x for llama.cpp at the moment.
146+
IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongly recommended to use BLAS.
143147
144148
## Getting Help on IBM Z & LinuxONE
145149

0 commit comments

Comments
 (0)