Releases · ggml-org/llama.cpp

04 Aug 21:44

19f68fa

b6088

imatrix : warn when GGUF imatrix is saved without .gguf suffix (#15076)

* imatrix : add warning when suffix is not .gguf for GGUF imatrix

* imatrix : only warn about suffix when output format is unspecified

Assets 15

04 Aug 20:19

github-actions

b6087

4161343

b6087

cmake: Add GGML_BACKEND_DIR option (#15074)

* cmake: Add GGML_BACKEND_DIR option

This can be used by distributions to specify where to look for backends
when ggml is built with GGML_BACKEND_DL=ON.

* Fix phrasing

Assets 15

04 Aug 18:53

github-actions

b6085

ef0144c

b6085

model: support GLM 4.5 family of models (#14939)

* model: Add GLM 4.5 (#14921)

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Merge in PR suggestions

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* model: Add GLM 4.5 family of models (#14921)

1. Updated tensor_mapping.py with NextN tensor mappings

- Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py
- Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm

2. Added num_nextn_predict_layers configuration

- Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp
- Added num_nextn_predict_layers field to llama_hparams struct
- Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter
- Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers
- Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method
- Updated conversion script to extract and write this parameter from HuggingFace config

3. Added FIM tokens for GLM4_MOE

- Added GLM-4.5's FIM tokens to llama-vocab.cpp:
  - <|code_prefix|> for FIM_PRE
  - <|code_suffix|> for FIM_SUF
  - <|code_middle|> for FIM_MID

4. Removed manual NextN tensor handling

- Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors
- NextN tensors are now handled automatically through the proper tensor mapping system

* glm 4.5 update tensors names

* model: glm 4.5 apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* model: glm 4.5 apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* model: glm 4.5 apply suggestions from code review

* Apply suggestions from code review

* patch broken chat template

* typings fix

* add TENSOR_SKIP flag


Co-authored-by: Diego Devesa <[email protected]>

* Update src/llama-model-loader.h

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>

Assets 15

04 Aug 16:46

github-actions

b6084

2721257

b6084

quantize : fix confusing error message if ftype is invalid (#15071)

Assets 15

04 Aug 16:11

github-actions

b6083

587d011

b6083

ggml: WebGPU backend host improvements and style fixing (#14978)

* Add parameter buffer pool, batching of submissions, refactor command building/submission

* Add header for linux builds

* Free staged parameter buffers at once

* Format with clang-format

* Fix thread-safe implementation

* Use device implicit synchronization

* Update workflow to use custom release

* Remove testing branch workflow

Assets 15

04 Aug 05:27

github-actions

b6082

5aa1105

b6082

vulkan: fix build when using glslang that does not support coopmat2 (…

Assets 15

03 Aug 20:49

github-actions

b6081

d31192b

b6081

imatrix : use GGUF by default (#14842)

* imatrix : use GGUF by default

* imatrix : use GGUF regardless of the output filename

The legacy format can only be produced with --output-format dat

Assets 15

03 Aug 20:43

github-actions

b6080

0a2f549

b6080

imatrix : fix 3d activation handling for hybrid and recurrent models …

Assets 15

03 Aug 20:30

github-actions

b6079

11a3811

b6079

memory : handle kv_unified for hybrid models (#15050)

Assets 15

03 Aug 20:28

github-actions

b6078

97366dc

b6078

vocab : JetBrains Mellum pre-tokenizer (#15045)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6088

Uh oh!

b6087

Uh oh!

b6085

Uh oh!

b6084

Uh oh!

b6083

Uh oh!

b6082

Uh oh!

b6081

Uh oh!

b6080

Uh oh!

b6079

Uh oh!

b6078

Uh oh!