Sync master with upstream release b5994 #179

jan-service-account · 2025-07-26T00:12:36Z

Updates dev branch with latest release (b5994) from ggml-org/llama.cpp

…llelism (ggml-org#14855) ggml-ci

…org#14503) * [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip * Update export-lora.cpp * Update clip.cpp * Update export-lora.cpp * format: use space to replace tab

…org#14870) ggml-ci

Neither "g" nor "x" are valid portPos specifiers per the official [graphviz documents](https://graphviz.org/docs/attr-types/portPos/): > If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_". I tested locally for it to fall back to default portPos specifier if an invalid portPos is specified. As a consequence, we can remove associated code.

…gml-org#14874) This patch updates the example in docs/development/HOWTO-add-model.md to reflect recent changes after `TextModel` and `MmprojModel` were introduced. It replaces the outdated `Model` base class with `TextModel` or `MmprojModel` and updates the registration example accordingly. Signed-off-by: Wook Song <[email protected]>

* opencl: add fused `rms_norm` + `mul` * opencl: improve workgroup size for `rms_norm_mul`

@compilade

* feat: Add s_off as a parameter in the args struct This may not be necessary, but it more closely mirrors the CUDA kernel Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> * perf: Parallelize mamba2 SSM_SCAN metal kernel over d_state This is a first attempt at optimizing the metal kernel. The changes here are: - Launch the kernel with a thread group of size d_state - Use simd groups and shared memory to do the summation for the y computation When tested with G4 tiny preview, this shows roughly a 3x speedup on prefill and 15% speedup on decode. Signed-off-by: Gabe Goodhart <[email protected]> * fix: Update logic to correctly do the multi-layer parallel sum Signed-off-by: Gabe Goodhart <[email protected]> * fix: Correctly size the shared memory bufer and assert expected size relationships Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> * refactor: Compute block offsets once rather than once per token Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> * feat: Use local variable for state recursion Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> * feat: Use a secondary simd_sum instead of a for loop Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> * feat: Add assertion and comment about relationship between simd size and num simd groups Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> * feat: Parallelize of d_state for mamba-1 Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> * feat: Parallel sum in SSM_CONV Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> * Revert "feat: Parallel sum in SSM_CONV" After discussion with @compilade, the size of the parallelism here is not worth the cost in complexity or overhead of the parallel for. ggml-org#14743 (comment) This reverts commit 16bc059. Signed-off-by: Gabe Goodhart <[email protected]> * refactor: Simplify shared memory sizing Branch: GraniteFourPerf Signed-off-by: Gabe Goodhart <[email protected]> Co-Authored-By: Georgi Gerganov <[email protected]> --------- Signed-off-by: Gabe Goodhart <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

…14880) * docs: update s390x document for sentencepiece Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit e086c5e) * docs: update huggingface links + reword Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 8410b08) * ggml-cpu: disable ggml-nnpa compile flag by default fixes ggml-org#14877 Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 412f4c7) * docs: update s390x build docs to reflect nnpa disable Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit c1eeae1) --------- Signed-off-by: Aaron Teo <[email protected]>

slaren and others added 9 commits July 25, 2025 11:07

sched : fix multiple evaluations of the same graph with pipeline para…

c12bbde

…llelism (ggml-org#14855) ggml-ci

rpc : check for null buffers in get/set/copy tensor endpoints (ggml-o…

64bf1c3

…rg#14868)

mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-…

749e0d2

…org#14503) * [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip * Update export-lora.cpp * Update clip.cpp * Update export-lora.cpp * format: use space to replace tab

context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-…

c1dbea7

…org#14870) ggml-ci

opencl: add fused rms_norm_mul (ggml-org#14841)

ce111d3

* opencl: add fused `rms_norm` + `mul` * opencl: improve workgroup size for `rms_norm_mul`

jan-service-account merged commit 46d5c69 into dev Jul 26, 2025
9 checks passed

jan-service-account deleted the update-dev-from-master-2025-07-26-00-12 branch July 26, 2025 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5994 #179

Sync master with upstream release b5994 #179

Uh oh!

jan-service-account commented Jul 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Sync master with upstream release b5994 #179

Sync master with upstream release b5994 #179

Uh oh!

Conversation

jan-service-account commented Jul 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants