Skip to content

Releases: ggml-org/llama.cpp

b5140

15 Apr 12:06
54a7272
Compare
Choose a tag to compare
CANN: Add x86 build ci (#12950)

* CANN: Add x86 build ci

* CANN: fix code format

b5138

15 Apr 09:50
5106764
Compare
Choose a tag to compare
SYCL: Add ROPE vision kernel (#12887)

* SYCL: Add ROPE vision kernel

* Add comment about rope mode

b5137

15 Apr 08:10
daa4228
Compare
Choose a tag to compare
llama : DeepSeek V2/V3 MLA implementation (#12801)

* Merged using squash to remove all noise commit messages

* Force flash attention off for `LLM_ARCH_DEEPSEEK2` - embedding too large

* Removed 3 conts (2x RoPE and 1x RMS-norm)

* Changed to use `<cmath>` instead of `<math.h>`

* Reverted removal of the 3 conts

* Used `reshape` in `llm_graph_context::build_attn_mha()`

* Use `k_pe = ggml_reshape`

* Removed the 3 conts again

* Removed the 3D views of `wk_b` and `wv_b`, and just save and 3D in GGUF

* Removed MQA optimisation from `build_attn_mha()` as no gains now

* Simplified `is_mla` branch in `llm_build_deepseek2()`

* Removed `build_attn_mla` and added `nullptr` to all `build_atnn` calls

* Fixed call to `build_attn` in `llm_build_t5_enc`

b5136

15 Apr 07:54
eccc7a1
Compare
Choose a tag to compare
ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829)

* Add AVX512 implementation of GEMM - q4kx8

* Update changes to remove unnecessary whitespaces

b5135

15 Apr 03:21
0019279
Compare
Choose a tag to compare
CANN: Opt ROPE optimization (#12865)

* [CANN]Opt ROPE optimization

* [CANN]Codestyle adjustment

* [CANN]Fix the ROPE precision issue

* [CANN]codestyle fix

* [CANN]add rope unsupport case

Signed-off-by: noemotiovon <[email protected]>

b5134

15 Apr 02:49
b0c75ac
Compare
Choose a tag to compare
CANN: Optimize CANN buffer pool memory management (#12875)

Multiple optional memory pools are provided for CANN, including VMM, 
priority queue-based, and traditional memory pools.
1.When the memory pool is available and GGML_CANN_DISABLE_VMM_POOL 
   is not defined, the VMM pool is selected by default.
2.Otherwise, if GGML_CANN_ENABLE_BUF_PRIO_POOL is defined, 
   the priority queue-based memory pool is used.
3.If neither condition is met, the default memory pool is used.

b5133

14 Apr 18:25
d6d2c2a
Compare
Choose a tag to compare
Add performance print for gemma3 in example (#12929)

b5132

14 Apr 13:27
75afa0a
Compare
Choose a tag to compare
SYCL: Fix im2col (#12910)

* SYCL: Fix im2col

* restore local workgroup size adjustments for large inputs

* restore format

b5131

14 Apr 11:51
c772d54
Compare
Choose a tag to compare
rpc : use ggml_context_ptr (#12938)

b5129

14 Apr 07:14
Compare
Choose a tag to compare
sync : ggml

ggml-ci