Releases · ggml-org/llama.cpp

04 Apr 21:19

7a84777

b5054

sync: minja (#12739)

* sync: minja

https://github.com/google/minja/pull/57

* fix json include

Assets 26

04 Apr 19:31

github-actions

b5053

3e1d293

b5053

kv-cache : simplify + fix warning for recurrent models (#12756)

ggml-ci

Assets 26

04 Apr 17:57

github-actions

b5052

1be76e4

b5052

ci: add Linux cross-compile build (#12428)

Assets 25

04 Apr 14:58

github-actions

b5050

23106f9

b5050

gguf-split : --merge now respects --dry-run option (#12681)

* gguf-split now respects dry-run option

* removing trailing space

Assets 26

04 Apr 14:44

github-actions

b5049

94148ba

b5049

sycl: allow ggml-sycl configuration and compilation using Visual Stud…

Assets 26

04 Apr 07:00

github-actions

b5046

74d4f5b

b5046

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (…

Assets 26

04 Apr 06:45

github-actions

b5045

35e592e

b5045

vulkan: set cmake minimum and project name in vulkan-shaders (#12744)

Assets 26

03 Apr 17:28

github-actions

b5043

c262bed

b5043

CUDA: Prefer vector flash decoding kernel for Gemma models (#12738)

* Prefer vector flash decoding kernel for Gemma models

Vector flash decoding kernel was not being picked for models with head dimension 256. Gemma models are in this category.
Removing this limit improves e2e performance by upto 12% in gen phase throughput for Gemm models.

* Update ggml/src/ggml-cuda/fattn.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

Assets 26

03 Apr 16:18

github-actions

b5041

1c05999

b5041

vulkan: Fix missing cmake logic for dot product extension (#12721)

Assets 26

03 Apr 12:40

github-actions

b5039

5f696e8

b5039

sync : minja (inclusionAI/Ling) and update tests (#12699)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5054

Uh oh!

b5053

Uh oh!

b5052

Uh oh!

b5050

Uh oh!

b5049

Uh oh!

b5046

Uh oh!

b5045

Uh oh!

b5043

Uh oh!

b5041

Uh oh!

b5039

Uh oh!