Releases · ggml-org/llama.cpp

15 Apr 12:06

54a7272

b5140

CANN: Add x86 build ci (#12950)

* CANN: Add x86 build ci

* CANN: fix code format

Assets 26

15 Apr 09:50

github-actions

b5138

5106764

b5138

SYCL: Add ROPE vision kernel (#12887)

* SYCL: Add ROPE vision kernel

* Add comment about rope mode

Assets 26

15 Apr 08:10

github-actions

b5137

daa4228

b5137

llama : DeepSeek V2/V3 MLA implementation (#12801)

* Merged using squash to remove all noise commit messages

* Force flash attention off for `LLM_ARCH_DEEPSEEK2` - embedding too large

* Removed 3 conts (2x RoPE and 1x RMS-norm)

* Changed to use `<cmath>` instead of `<math.h>`

* Reverted removal of the 3 conts

* Used `reshape` in `llm_graph_context::build_attn_mha()`

* Use `k_pe = ggml_reshape`

* Removed the 3 conts again

* Removed the 3D views of `wk_b` and `wv_b`, and just save and 3D in GGUF

* Removed MQA optimisation from `build_attn_mha()` as no gains now

* Simplified `is_mla` branch in `llm_build_deepseek2()`

* Removed `build_attn_mla` and added `nullptr` to all `build_atnn` calls

* Fixed call to `build_attn` in `llm_build_t5_enc`

Assets 26

15 Apr 07:54

github-actions

b5136

eccc7a1

b5136

ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829)

* Add AVX512 implementation of GEMM - q4kx8

* Update changes to remove unnecessary whitespaces

Assets 26

15 Apr 03:21

github-actions

b5135

0019279

b5135

CANN: Opt ROPE optimization (#12865)

* [CANN]Opt ROPE optimization

* [CANN]Codestyle adjustment

* [CANN]Fix the ROPE precision issue

* [CANN]codestyle fix

* [CANN]add rope unsupport case

Signed-off-by: noemotiovon <[email protected]>

Assets 26

15 Apr 02:49

github-actions

b5134

b0c75ac

b5134

CANN: Optimize CANN buffer pool memory management (#12875)

Multiple optional memory pools are provided for CANN, including VMM, 
priority queue-based, and traditional memory pools.
1.When the memory pool is available and GGML_CANN_DISABLE_VMM_POOL 
   is not defined, the VMM pool is selected by default.
2.Otherwise, if GGML_CANN_ENABLE_BUF_PRIO_POOL is defined, 
   the priority queue-based memory pool is used.
3.If neither condition is met, the default memory pool is used.

Assets 26

14 Apr 18:25

github-actions

b5133

d6d2c2a

b5133

Add performance print for gemma3 in example (#12929)

Assets 26

14 Apr 13:27

github-actions

b5132

75afa0a

b5132

SYCL: Fix im2col (#12910)

* SYCL: Fix im2col

* restore local workgroup size adjustments for large inputs

* restore format

Assets 26

14 Apr 11:51

github-actions

b5131

c772d54

b5131

rpc : use ggml_context_ptr (#12938)

Assets 26

14 Apr 07:14

github-actions

b5129

526739b

b5129

sync : ggml

ggml-ci

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5140

Uh oh!

b5138

Uh oh!

b5137

Uh oh!

b5136

Uh oh!

b5135

Uh oh!

b5134

Uh oh!

b5133

Uh oh!

b5132

Uh oh!

b5131

Uh oh!

b5129

Uh oh!