Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b5994) from ggml-org/llama.cpp

slaren and others added 9 commits July 25, 2025 11:07
…org#14503)

* [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip

* Update export-lora.cpp

* Update clip.cpp

* Update export-lora.cpp

* format: use space to replace tab
Neither "g" nor "x" are valid portPos specifiers per the official
[graphviz documents](https://graphviz.org/docs/attr-types/portPos/):

> If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_".

I tested locally for it to fall back to default portPos specifier if an
invalid portPos is specified. As a consequence, we can remove associated
code.
…gml-org#14874)

This patch updates the example in docs/development/HOWTO-add-model.md to
reflect recent changes after `TextModel` and `MmprojModel` were introduced.

It replaces the outdated `Model` base class with `TextModel` or `MmprojModel`
and updates the registration example accordingly.

Signed-off-by: Wook Song <[email protected]>
* opencl: add fused `rms_norm` + `mul`

* opencl: improve workgroup size for `rms_norm_mul`
* feat: Add s_off as a parameter in the args struct

This may not be necessary, but it more closely mirrors the CUDA kernel

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>

* perf: Parallelize mamba2 SSM_SCAN metal kernel over d_state

This is a first attempt at optimizing the metal kernel. The changes here
are:

- Launch the kernel with a thread group of size d_state
- Use simd groups and shared memory to do the summation for the y
  computation

When tested with G4 tiny preview, this shows roughly a 3x speedup on
prefill and 15% speedup on decode.

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Update logic to correctly do the multi-layer parallel sum

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Correctly size the shared memory bufer and assert expected size relationships

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>

* refactor: Compute block offsets once rather than once per token

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Use local variable for state recursion

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Use a secondary simd_sum instead of a for loop

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add assertion and comment about relationship between simd size and num simd groups

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Parallelize of d_state for mamba-1

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Parallel sum in SSM_CONV

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>

* Revert "feat: Parallel sum in SSM_CONV"

After discussion with @compilade, the size of the parallelism here is
not worth the cost in complexity or overhead of the parallel for.

ggml-org#14743 (comment)

This reverts commit 16bc059.

Signed-off-by: Gabe Goodhart <[email protected]>

* refactor: Simplify shared memory sizing

Branch: GraniteFourPerf

Signed-off-by: Gabe Goodhart <[email protected]>
Co-Authored-By: Georgi Gerganov <[email protected]>

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
…14880)

* docs: update s390x document for sentencepiece

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit e086c5e)

* docs: update huggingface links + reword

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 8410b08)

* ggml-cpu: disable ggml-nnpa compile flag by default

fixes ggml-org#14877

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 412f4c7)

* docs: update s390x build docs to reflect nnpa disable

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit c1eeae1)

---------

Signed-off-by: Aaron Teo <[email protected]>
@jan-service-account jan-service-account merged commit 46d5c69 into dev Jul 26, 2025
9 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-07-26-00-12 branch July 26, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.