You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/build-s390x.md
+11-7Lines changed: 11 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -111,35 +111,39 @@ All models need to be converted to Big-Endian. You can achieve this in three cas
111
111
112
112
## IBM Accelerators
113
113
114
-
### 1. zDNN Accelerator
114
+
### 1. SIMD Acceleration
115
115
116
-
*Only available in IBM z16 and onwards. No direction at the moment.*
116
+
Only available in IBM z15 or later system with the `-DGGML_VXE=ON` (turned on by default). No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14 or EC13. In such systems, the APIs can still run but will use a scalar implementation.
117
117
118
-
### 2. Spyre Accelerator
118
+
### 2. zDNN Accelerator
119
+
120
+
*Only available in IBM z16 or later system. No direction at the moment.*
121
+
122
+
### 3. Spyre Accelerator
119
123
120
124
*No direction at the moment.*
121
125
122
126
## Performance Tuning
123
127
124
128
### 1. Virtualization Setup
125
129
126
-
We strongly recommend using only LPAR (Type-1) virtualization to get the most performance.
130
+
It is strongly recommended to use only LPAR (Type-1) virtualization to get the most performance.
127
131
128
132
Note: Type-2 virtualization is not supported at the moment, while you can get it running, the performance will not be the best.
129
133
130
134
### 2. IFL (Core) Count
131
135
132
-
We recommend a minimum of 8 shared IFLs assigned to the LPAR. Increasing the IFL count past 8 shared IFLs will only improve Prompt Processing performance but not Token Generation.
136
+
It is recommended to allocate a minimum of 8 shared IFLs assigned to the LPAR. Increasing the IFL count past 8 shared IFLs will only improve Prompt Processing performance but not Token Generation.
133
137
134
138
Note: IFL count does not equate to vCPU count.
135
139
136
140
### 3. SMT vs NOSMT (Simultaneous Multithreading)
137
141
138
-
We strongly recommend disabling SMT via the kernel boot parameters as it negatively affects performance. Please refer to your Linux distribution's guide on disabling SMT via kernel boot parameters.
142
+
It is strongly recommended to disable SMT via the kernel boot parameters as it negatively affects performance. Please refer to your Linux distribution's guide on disabling SMT via kernel boot parameters.
139
143
140
144
### 4. BLAS vs NOBLAS
141
145
142
-
We strongly recommend using BLAS for llama.cpp as there are no custom kernels for s390x for llama.cpp at the moment.
146
+
IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongly recommended to use BLAS.
0 commit comments