Skip to content

Releases: ggml-org/llama.cpp

b4998

30 Mar 10:20
492d7f1
Compare
Choose a tag to compare
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci a…

b4997

30 Mar 06:17
Compare
Choose a tag to compare
sync : ggml

ggml-ci

b4992

29 Mar 23:48
af6ae1e
Compare
Choose a tag to compare
llama : fix non-causal mask for gemma 3 (#12615)

b4991

29 Mar 14:13
0bb2919
Compare
Choose a tag to compare
llama : change cpu_buft_list order: ACCEL -> GPU host -> CPU extra ->…

b4990

29 Mar 10:46
a69f846
Compare
Choose a tag to compare
cmake : fix ccache conflict (#12522)

If users already set CMAKE_C_COMPILER_LAUNCHER globally, setting it in
cmake again will lead to conflict and compile fail.

Signed-off-by: Jay <[email protected]>

b4988

28 Mar 21:56
3714c3e
Compare
Choose a tag to compare
llama : fix incorrect Qwen2Moe ffn_moe_out graph callback (#12631)

b4987

28 Mar 19:06
b4ae508
Compare
Choose a tag to compare
metal : improve FA + improve MoE (#12612)

* ggml : FA with different K, V head sizes (CPU)

ggml-ci

* metal : add FA with HS=192

* metal : extend FA to support different K and V head sizes

ggml-ci

* metal : add FA vector kernels for heads K 192 and V 128

ggml-ci

* ggml : restrict op on other backends to equal head sizes

ggml-ci

* metal : optimize FA-vec kernel

ggml-ci

* metal : FA remove mq registers

* metal : improve MoE mul_mat_id condition

ggml-ci

* metal : fix comments + remove unnecessary addition

ggml-ci

* metal : avoid too much shared memory usage with mul_mat_id

ggml-ci

b4986

28 Mar 18:42
b86f600
Compare
Choose a tag to compare
vulkan: fix coopmat shader generation when cross-compiling (#12272)

* vulkan: fix coopmat shader generation when cross-compiling

Previously the status of coopmat{,2} support isn't passed to the
vulkan-shaders-gen project building on the host, which leads to build
failure because of the cross-compiling code expecting coopmat{,2}
shaders that didn't get generated.

Fix this by passing the coopmat{,2} support status to vulkan-shaders
subproject.

Signed-off-by: Icenowy Zheng <[email protected]>

* Only call coop-mat shaders once

* Fix whitespace

---------

Signed-off-by: Icenowy Zheng <[email protected]>
Co-authored-by: bandoti <[email protected]>

b4985

28 Mar 18:02
dd373dd
Compare
Choose a tag to compare
llama: fix error on bad grammar (#12628)

b4984

28 Mar 08:59
5d01670
Compare
Choose a tag to compare
server : include speculative decoding stats when timings_per_token is…