ggml-cpu: disable GGML_NNPA by default due to instability #14879

taronaeo · 2025-07-25T13:34:05Z

Fixes #14877. Updates s390x build documentation as well.

Signed-off-by: Aaron Teo <[email protected]>

The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.

* kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review

* add conv2d kernel * fix trailing whitespace * whitespace fixe * handle f16 input and f16 kernel, more opt * resolve conflicts * use enqueue_ndrange_kernel

Signed-off-by: Xiaodong Ye <[email protected]>

Signed-off-by: Molly Sophia <[email protected]>

* implement bf16 cpy ops and enable bf16 cont * deduplicate copy functions * deduplicate checks

* Mtmd: add a way to select device for vision encoder * simplify * format * Warn user if manual device selection failed * initialize backend to nullptr

…n imatrix file (ggml-org#12718) * Add --show-statistics option * Add --show-statistics logic * Add tensor name parsing * Tidy output format * Fix typo in title * Improve tensor influence ranking * Add better statistics * Change statistics' sort order * Add Cosine Similarity * Add header search path * Change header search path to private * Add weighted statistics per layer * Update report title * Refactor compute_statistics out of main * Refactor compute_cossim out of load_imatrix * Refactor compute_statistics out of load_imatrix * Move imatrix statistics calculation into its own functions * Add checks and validations * Remove unnecessary include directory * Rename labels * Add m_stats getter and refactor compute_statistics out of load_imatrix * Refactor variable names * Minor cosmetic change * Retrigger checks (empty commit) * Rerun checks (empty commit) * Fix unnecessary type promotion Co-authored-by: compilade <[email protected]> * Reverting change to improve code readability * Rerun checks (empty commit) * Rerun checks (empty commit) * Rerun checks - third time's the Charm 🤞 (empty commit) * Minor cosmetic change * Update README * Fix typo * Update README * Rerun checks (empty commit) * Re-implement changes on top of ggml-org#9400 * Update README.md * Update README * Update README.md Co-authored-by: compilade <[email protected]> * Update README.md Co-authored-by: compilade <[email protected]> * Update README.md * Remove duplicate option in print_usage() * Update README.md * Update README.md Co-authored-by: compilade <[email protected]> * Update README.md Co-authored-by: compilade <[email protected]> * Remove input check * Remove commented out code --------- Co-authored-by: compilade <[email protected]>

Signed-off-by: Molly Sophia <[email protected]>

* weight format to nz for 310p * remove quant weight format to nz * clean code * fix * make the conditions for converting weights to NZ format consistent * clean code

…org#14675) * Update llama-memory-recurrent.cpp handle saving/loading null layers in recurrent memory * fixed styling issues and updated comments * fix styling issue Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>

ggml-ci

* CUDA: fix quantized KV cache + multiple sequences * Update ggml/src/ggml-cuda/fattn-common.cuh Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

* use language_model part only, ignore visual layers * fix rope_dim calculation

* metal : fix fusion across different encoders ggml-ci * cont : add assertion ggml-ci

* docs: add libcurl-dev install hint for Linux distros Signed-off-by: PouyaGhahramanian <[email protected]> * Update docs/build.md --------- Signed-off-by: PouyaGhahramanian <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

* CMake config: Create target only once Fix error on repeated find_package(ggml). For simplicity, check only for the top-level ggml::ggml. * CMake config: Add CUDA link libs * CMake config: Add OpenCL link libs * CMake config: Use canonical find_dependency Use set and append to control link lib variables. Apply more $<LINK_ONLY...>. * CMake config: Wire OpenMP dependency

ggml-ci

* musa: apply mublas API changes Signed-off-by: Xiaodong Ye <[email protected]> * musa: update musa version to 4.2.0 Signed-off-by: Xiaodong Ye <[email protected]> * musa: restore MUSA graph settings in CMakeLists.txt Signed-off-by: Xiaodong Ye <[email protected]> * musa: disable mudnnMemcpyAsync by default Signed-off-by: Xiaodong Ye <[email protected]> * musa: switch back to non-mudnn images Signed-off-by: Xiaodong Ye <[email protected]> * minor changes Signed-off-by: Xiaodong Ye <[email protected]> * musa: restore rc in docker image tag Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>

…llelism (ggml-org#14855) ggml-ci

…rg#14868)

…org#14503) * [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip * Update export-lora.cpp * Update clip.cpp * Update export-lora.cpp * format: use space to replace tab

…org#14870) ggml-ci

Neither "g" nor "x" are valid portPos specifiers per the official [graphviz documents](https://graphviz.org/docs/attr-types/portPos/): > If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_". I tested locally for it to fall back to default portPos specifier if an invalid portPos is specified. As a consequence, we can remove associated code.

fixes ggml-org#14877 Signed-off-by: Aaron Teo <[email protected]>

Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-07-25T13:34:40Z

Oh my god. This was not expected. Will re-create PR, sorry.

taronaeo and others added 30 commits July 21, 2025 18:21

docs: update s390x document for sentencepiece

e086c5e

Signed-off-by: Aaron Teo <[email protected]>

docs: update huggingface links + reword

8410b08

Signed-off-by: Aaron Teo <[email protected]>

vulkan/cuda: Fix im2col when KW!=KH (ggml-org#14789)

a2cdf55

The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.

docs : fix backends table in README.md (ggml-org#14796)

ae77ded

kleidiai: add support for get_rows (ggml-org#14676)

549f9eb

* kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review

sycl: Fix im2col (ggml-org#14797)

f04095b

opencl: add conv2d kernel (ggml-org#14403)

120add9

* add conv2d kernel * fix trailing whitespace * whitespace fixe * handle f16 input and f16 kernel, more opt * resolve conflicts * use enqueue_ndrange_kernel

opencl: fix im2col when KW!=KH (ggml-org#14803)

e77f241

cuda: remove linking to cublasLt (ggml-org#14790)

9e500e2

Signed-off-by: Xiaodong Ye <[email protected]>

server : allow setting --reverse-prompt arg (ggml-org#14799)

0dd3cd5

Signed-off-by: Molly Sophia <[email protected]>

opencl: remove unreachable return (ggml-org#14806)

1e54562

cuda : implement bf16 cpy ops and enable bf16 cont (ggml-org#14763)

4c94f27

* implement bf16 cpy ops and enable bf16 cont * deduplicate copy functions * deduplicate checks

Mtmd: add a way to select device for vision encoder (ggml-org#14236)

888b75b

* Mtmd: add a way to select device for vision encoder * simplify * format * Warn user if manual device selection failed * initialize backend to nullptr

llama : add model type detection for rwkv7 7B&14B (ggml-org#14816)

10a6765

Signed-off-by: Molly Sophia <[email protected]>

vulkan: fix rms_norm_mul to handle broadcasting dim0 (ggml-org#14817)

44d4801

ggml : model card yaml tab->2xspace (ggml-org#14819)

9b51256

CUDA: add fused rms norm (ggml-org#14800)

1e55890

CANN: weight format to NZ for Ascend310P3 (ggml-org#14407)

ef6198b

* weight format to nz for 310p * remove quant weight format to nz * clean code * fix * make the conditions for converting weights to NZ format consistent * clean code

ggml: fix loongarch quantize_row_q8_1 error (ggml-org#14827)

bd3c22a

tests : add non-cont K,V FA tests

90916df

ggml-ci

CUDA: fix quantized KV cache + multiple sequences (ggml-org#14822)

7473a0d

* CUDA: fix quantized KV cache + multiple sequences * Update ggml/src/ggml-cuda/fattn-common.cuh Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

ci : correct label refactor->refactoring (ggml-org#14832)

a3ddddb

CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)

9db975e

CUDA: fix overflow in FA, tune performance (ggml-org#14840)

5ad021f

convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823)

bd060d6

* use language_model part only, ignore visual layers * fix rope_dim calculation

sycl: fix undefined variable in work group size check (ggml-org#14843)

7234b89

metal : fix fusion across different encoders (ggml-org#14849)

e84b911

* metal : fix fusion across different encoders ggml-ci * cont : add assertion ggml-ci

dg0yt and others added 10 commits July 25, 2025 21:24

sync : ggml

45c2cc3

ggml-ci

sched : fix multiple evaluations of the same graph with pipeline para…

a122095

…llelism (ggml-org#14855) ggml-ci

rpc : check for null buffers in get/set/copy tensor endpoints (ggml-o…

328ed53

…rg#14868)

mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-…

092c1bd

…org#14503) * [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip * Update export-lora.cpp * Update clip.cpp * Update export-lora.cpp * format: use space to replace tab

context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-…

a6357ac

…org#14870) ggml-ci

ggml-cpu: disable ggml-nnpa compile flag by default

412f4c7

fixes ggml-org#14877 Signed-off-by: Aaron Teo <[email protected]>

docs: update s390x build docs to reflect nnpa disable

c1eeae1

Signed-off-by: Aaron Teo <[email protected]>

taronaeo requested review from 0cc4m, JohannesGaessler, ggerganov and ngxson as code owners July 25, 2025 13:34

taronaeo closed this Jul 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: disable GGML_NNPA by default due to instability #14879

ggml-cpu: disable GGML_NNPA by default due to instability #14879

Uh oh!

taronaeo commented Jul 25, 2025

Uh oh!

taronaeo commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

31 participants

ggml-cpu: disable GGML_NNPA by default due to instability #14879

ggml-cpu: disable GGML_NNPA by default due to instability #14879

Uh oh!

Conversation

taronaeo commented Jul 25, 2025

Uh oh!

taronaeo commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

31 participants