Skip to content

Commit 0cc2017

Browse files
committed
ggml-cpu: enable GGML_NNPA by default
Signed-off-by: Aaron Teo <[email protected]>
1 parent 14c870d commit 0cc2017

File tree

2 files changed

+5
-9
lines changed

2 files changed

+5
-9
lines changed

docs/build-s390x.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,14 @@ cmake --build build --config Release -j $(nproc)
4242
cmake --build build --config Release -j $(nproc)
4343
```
4444

45-
- By default, NNPA is disabled by default. To enable it:
45+
- By default, NNPA is enabled when available. To disable it (not recommended):
4646

4747
```bash
4848
cmake -S . -B build \
4949
-DCMAKE_BUILD_TYPE=Release \
5050
-DGGML_BLAS=ON \
5151
-DGGML_BLAS_VENDOR=OpenBLAS \
52-
-DGGML_NNPA=ON
52+
-DGGML_NNPA=OFF
5353
5454
cmake --build build --config Release -j $(nproc)
5555
```
@@ -166,7 +166,7 @@ Only available in IBM z15/LinuxONE 3 or later system with the `-DGGML_VXE=ON` (t
166166

167167
### 2. NNPA Vector Intrinsics Acceleration
168168

169-
Only available in IBM z16/LinuxONE 4 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
169+
Only available in IBM z16/LinuxONE 4 or later system with the `-DGGML_NNPA=ON` (turned on when available) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
170170

171171
### 3. zDNN Accelerator (WIP)
172172

@@ -230,10 +230,6 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
230230
CXXFLAGS="-include cstdint" pip3 install -r requirements.txt
231231
```
232232

233-
5. `-DGGML_NNPA=ON` generates gibberish output
234-
235-
Answer: We are aware of this as detailed in [this issue](https://github.com/ggml-org/llama.cpp/issues/14877). Please either try reducing the number of threads, or disable the compile option using `-DGGML_NNPA=OFF`.
236-
237233
## Getting Help on IBM Z & LinuxONE
238234

239235
1. **Bugs, Feature Requests**
@@ -292,4 +288,4 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl
292288
- 🚫 - acceleration unavailable, will still run using scalar implementation
293289
- ❓ - acceleration unknown, please contribute if you can test it yourself
294290

295-
Last Updated by **Aaron Teo ([email protected])** on Aug 22, 2025.
291+
Last Updated by **Aaron Teo ([email protected])** on Sep 2, 2025.

ggml/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ option(GGML_RVV "ggml: enable rvv" ON)
132132
option(GGML_RV_ZFH "ggml: enable riscv zfh" OFF)
133133
option(GGML_XTHEADVECTOR "ggml: enable xtheadvector" OFF)
134134
option(GGML_VXE "ggml: enable vxe" ON)
135-
option(GGML_NNPA "ggml: enable nnpa" OFF) # temp disabled by default, see: https://github.com/ggml-org/llama.cpp/issues/14877
135+
option(GGML_NNPA "ggml: enable nnpa" ON)
136136

137137
option(GGML_CPU_ALL_VARIANTS "ggml: build all variants of the CPU backend (requires GGML_BACKEND_DL)" OFF)
138138
set(GGML_CPU_ARM_ARCH "" CACHE STRING "ggml: CPU architecture for ARM")

0 commit comments

Comments
 (0)