Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6088
imatrix : warn when GGUF imatrix is saved without .gguf suffix (#15076) * imatrix : add warning when suffix is not .gguf for GGUF imatrix * imatrix : only warn about suffix when output format is unspecified
b6087
cmake: Add GGML_BACKEND_DIR option (#15074) * cmake: Add GGML_BACKEND_DIR option This can be used by distributions to specify where to look for backends when ggml is built with GGML_BACKEND_DL=ON. * Fix phrasing
b6085
model: support GLM 4.5 family of models (#14939) * model: Add GLM 4.5 (#14921) Co-authored-by: Sigbjørn Skjæret <[email protected]> * Merge in PR suggestions Co-authored-by: Sigbjørn Skjæret <[email protected]> * model: Add GLM 4.5 family of models (#14921) 1. Updated tensor_mapping.py with NextN tensor mappings - Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py - Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm 2. Added num_nextn_predict_layers configuration - Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp - Added num_nextn_predict_layers field to llama_hparams struct - Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter - Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers - Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method - Updated conversion script to extract and write this parameter from HuggingFace config 3. Added FIM tokens for GLM4_MOE - Added GLM-4.5's FIM tokens to llama-vocab.cpp: - <|code_prefix|> for FIM_PRE - <|code_suffix|> for FIM_SUF - <|code_middle|> for FIM_MID 4. Removed manual NextN tensor handling - Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors - NextN tensors are now handled automatically through the proper tensor mapping system * glm 4.5 update tensors names * model: glm 4.5 apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * model: glm 4.5 apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <[email protected]> * model: glm 4.5 apply suggestions from code review * Apply suggestions from code review * patch broken chat template * typings fix * add TENSOR_SKIP flag Co-authored-by: Diego Devesa <[email protected]> * Update src/llama-model-loader.h Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: Diego Devesa <[email protected]>
b6084
quantize : fix confusing error message if ftype is invalid (#15071)
b6083
ggml: WebGPU backend host improvements and style fixing (#14978) * Add parameter buffer pool, batching of submissions, refactor command building/submission * Add header for linux builds * Free staged parameter buffers at once * Format with clang-format * Fix thread-safe implementation * Use device implicit synchronization * Update workflow to use custom release * Remove testing branch workflow
b6082
vulkan: fix build when using glslang that does not support coopmat2 (…
b6081
imatrix : use GGUF by default (#14842) * imatrix : use GGUF by default * imatrix : use GGUF regardless of the output filename The legacy format can only be produced with --output-format dat
b6080
imatrix : fix 3d activation handling for hybrid and recurrent models …
b6079
memory : handle kv_unified for hybrid models (#15050)
b6078
vocab : JetBrains Mellum pre-tokenizer (#15045)