Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5140
CANN: Add x86 build ci (#12950) * CANN: Add x86 build ci * CANN: fix code format
b5138
SYCL: Add ROPE vision kernel (#12887) * SYCL: Add ROPE vision kernel * Add comment about rope mode
b5137
llama : DeepSeek V2/V3 MLA implementation (#12801) * Merged using squash to remove all noise commit messages * Force flash attention off for `LLM_ARCH_DEEPSEEK2` - embedding too large * Removed 3 conts (2x RoPE and 1x RMS-norm) * Changed to use `<cmath>` instead of `<math.h>` * Reverted removal of the 3 conts * Used `reshape` in `llm_graph_context::build_attn_mha()` * Use `k_pe = ggml_reshape` * Removed the 3 conts again * Removed the 3D views of `wk_b` and `wv_b`, and just save and 3D in GGUF * Removed MQA optimisation from `build_attn_mha()` as no gains now * Simplified `is_mla` branch in `llm_build_deepseek2()` * Removed `build_attn_mla` and added `nullptr` to all `build_atnn` calls * Fixed call to `build_attn` in `llm_build_t5_enc`
b5136
ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829) * Add AVX512 implementation of GEMM - q4kx8 * Update changes to remove unnecessary whitespaces
b5135
CANN: Opt ROPE optimization (#12865) * [CANN]Opt ROPE optimization * [CANN]Codestyle adjustment * [CANN]Fix the ROPE precision issue * [CANN]codestyle fix * [CANN]add rope unsupport case Signed-off-by: noemotiovon <[email protected]>
b5134
CANN: Optimize CANN buffer pool memory management (#12875) Multiple optional memory pools are provided for CANN, including VMM, priority queue-based, and traditional memory pools. 1.When the memory pool is available and GGML_CANN_DISABLE_VMM_POOL is not defined, the VMM pool is selected by default. 2.Otherwise, if GGML_CANN_ENABLE_BUF_PRIO_POOL is defined, the priority queue-based memory pool is used. 3.If neither condition is met, the default memory pool is used.
b5133
Add performance print for gemma3 in example (#12929)
b5132
SYCL: Fix im2col (#12910) * SYCL: Fix im2col * restore local workgroup size adjustments for large inputs * restore format
b5131
rpc : use ggml_context_ptr (#12938)
b5129
sync : ggml ggml-ci