Skip to content

Releases: ggml-org/llama.cpp

b5054

04 Apr 21:19
7a84777
Compare
Choose a tag to compare
sync: minja (#12739)

* sync: minja

https://github.com/google/minja/pull/57

* fix json include

b5053

04 Apr 19:31
3e1d293
Compare
Choose a tag to compare
kv-cache : simplify + fix warning for recurrent models (#12756)

ggml-ci

b5052

04 Apr 17:57
1be76e4
Compare
Choose a tag to compare
ci: add Linux cross-compile build (#12428)

b5050

04 Apr 14:58
23106f9
Compare
Choose a tag to compare
gguf-split : --merge now respects --dry-run option (#12681)

* gguf-split now respects dry-run option

* removing trailing space

b5049

04 Apr 14:44
94148ba
Compare
Choose a tag to compare
sycl: allow ggml-sycl configuration and compilation using Visual Stud…

b5046

04 Apr 07:00
74d4f5b
Compare
Choose a tag to compare
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (…

b5045

04 Apr 06:45
35e592e
Compare
Choose a tag to compare
vulkan: set cmake minimum and project name in vulkan-shaders (#12744)

b5043

03 Apr 17:28
c262bed
Compare
Choose a tag to compare
CUDA: Prefer vector flash decoding kernel for Gemma models (#12738)

* Prefer vector flash decoding kernel for Gemma models

Vector flash decoding kernel was not being picked for models with head dimension 256. Gemma models are in this category.
Removing this limit improves e2e performance by upto 12% in gen phase throughput for Gemm models.

* Update ggml/src/ggml-cuda/fattn.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b5041

03 Apr 16:18
1c05999
Compare
Choose a tag to compare
vulkan: Fix missing cmake logic for dot product extension (#12721)

b5039

03 Apr 12:40
5f696e8
Compare
Choose a tag to compare
sync : minja (inclusionAI/Ling) and update tests (#12699)

Signed-off-by: Xiaodong Ye <[email protected]>