Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b5973) from ggml-org/llama.cpp

csabakecskemeti and others added 10 commits July 22, 2025 19:29
* weight format to nz for 310p

* remove quant weight format to nz

* clean code

* fix

* make the conditions for converting weights to NZ format consistent

* clean code
…org#14675)

* Update llama-memory-recurrent.cpp

handle saving/loading null layers in recurrent memory

* fixed styling issues and updated comments

* fix styling issue

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>
* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
@jan-service-account jan-service-account merged commit fa1602d into dev Jul 24, 2025
13 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-07-24-00-12 branch July 24, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants