Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4998
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci a…
b4997
sync : ggml ggml-ci
b4992
llama : fix non-causal mask for gemma 3 (#12615)
b4991
llama : change cpu_buft_list order: ACCEL -> GPU host -> CPU extra ->…
b4990
cmake : fix ccache conflict (#12522) If users already set CMAKE_C_COMPILER_LAUNCHER globally, setting it in cmake again will lead to conflict and compile fail. Signed-off-by: Jay <[email protected]>
b4988
llama : fix incorrect Qwen2Moe ffn_moe_out graph callback (#12631)
b4987
metal : improve FA + improve MoE (#12612) * ggml : FA with different K, V head sizes (CPU) ggml-ci * metal : add FA with HS=192 * metal : extend FA to support different K and V head sizes ggml-ci * metal : add FA vector kernels for heads K 192 and V 128 ggml-ci * ggml : restrict op on other backends to equal head sizes ggml-ci * metal : optimize FA-vec kernel ggml-ci * metal : FA remove mq registers * metal : improve MoE mul_mat_id condition ggml-ci * metal : fix comments + remove unnecessary addition ggml-ci * metal : avoid too much shared memory usage with mul_mat_id ggml-ci
b4986
vulkan: fix coopmat shader generation when cross-compiling (#12272) * vulkan: fix coopmat shader generation when cross-compiling Previously the status of coopmat{,2} support isn't passed to the vulkan-shaders-gen project building on the host, which leads to build failure because of the cross-compiling code expecting coopmat{,2} shaders that didn't get generated. Fix this by passing the coopmat{,2} support status to vulkan-shaders subproject. Signed-off-by: Icenowy Zheng <[email protected]> * Only call coop-mat shaders once * Fix whitespace --------- Signed-off-by: Icenowy Zheng <[email protected]> Co-authored-by: bandoti <[email protected]>
b4985
llama: fix error on bad grammar (#12628)
b4984
server : include speculative decoding stats when timings_per_token is…