Sync master with upstream release b6089 #194

jan-service-account · 2025-08-05T00:13:29Z

Updates dev branch with latest release (b6089) from ggml-org/llama.cpp

* Add parameter buffer pool, batching of submissions, refactor command building/submission * Add header for linux builds * Free staged parameter buffers at once * Format with clang-format * Fix thread-safe implementation * Use device implicit synchronization * Update workflow to use custom release * Remove testing branch workflow

…15071)

* model: Add GLM 4.5 (ggml-org#14921) Co-authored-by: Sigbjørn Skjæret <[email protected]> * Merge in PR suggestions Co-authored-by: Sigbjørn Skjæret <[email protected]> * model: Add GLM 4.5 family of models (ggml-org#14921) 1. Updated tensor_mapping.py with NextN tensor mappings - Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py - Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm 2. Added num_nextn_predict_layers configuration - Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp - Added num_nextn_predict_layers field to llama_hparams struct - Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter - Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers - Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method - Updated conversion script to extract and write this parameter from HuggingFace config 3. Added FIM tokens for GLM4_MOE - Added GLM-4.5's FIM tokens to llama-vocab.cpp: - <|code_prefix|> for FIM_PRE - <|code_suffix|> for FIM_SUF - <|code_middle|> for FIM_MID 4. Removed manual NextN tensor handling - Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors - NextN tensors are now handled automatically through the proper tensor mapping system * glm 4.5 update tensors names * model: glm 4.5 apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * model: glm 4.5 apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <[email protected]> * model: glm 4.5 apply suggestions from code review * Apply suggestions from code review * patch broken chat template * typings fix * add TENSOR_SKIP flag Co-authored-by: Diego Devesa <[email protected]> * Update src/llama-model-loader.h Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: Diego Devesa <[email protected]>

* cmake: Add GGML_BACKEND_DIR option This can be used by distributions to specify where to look for backends when ggml is built with GGML_BACKEND_DL=ON. * Fix phrasing

…org#15076) * imatrix : add warning when suffix is not .gguf for GGUF imatrix * imatrix : only warn about suffix when output format is unspecified

* llama : add --n-cpu-moe option Keeps the MoE weights of the first N layers in the CPU

jeffbolznv and others added 8 commits August 4, 2025 07:09

vulkan: fix build when using glslang that does not support coopmat2 (g…

5aa1105

…gml-org#15062)

quantize : fix confusing error message if ftype is invalid (ggml-org#…

2721257

…15071)

gguf-py : add --chat-template-file to gguf_new_metadata (ggml-org#15075)

e5bebe5

cmake: Add GGML_BACKEND_DIR option (ggml-org#15074)

4161343

* cmake: Add GGML_BACKEND_DIR option This can be used by distributions to specify where to look for backends when ggml is built with GGML_BACKEND_DL=ON. * Fix phrasing

imatrix : warn when GGUF imatrix is saved without .gguf suffix (ggml-…

19f68fa

…org#15076) * imatrix : add warning when suffix is not .gguf for GGUF imatrix * imatrix : only warn about suffix when output format is unspecified

llama : add --n-cpu-moe option (ggml-org#15077)

ec428b0

* llama : add --n-cpu-moe option Keeps the MoE weights of the first N layers in the CPU

jan-service-account merged commit 5ae5b31 into dev Aug 5, 2025
17 checks passed

jan-service-account deleted the update-dev-from-master-2025-08-05-00-13 branch August 5, 2025 00:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6089 #194

Sync master with upstream release b6089 #194

Uh oh!

jan-service-account commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Sync master with upstream release b6089 #194

Sync master with upstream release b6089 #194

Uh oh!

Conversation

jan-service-account commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants