Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6078
vocab : JetBrains Mellum pre-tokenizer (#15045)
b6076
vulkan: Use coopmat2 for conv2d (#14982)
b6075
opencl: fix adreno compiler detection logic (#15029)
b6074
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035)
b6073
cuda: make im2col a little faster (#15025)
b6071
llama : enable LLAMA_SET_ROWS=1 by default (#14959) ggml-ci
b6070
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038) * cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 ggml-ci * cont : fix cont types ggml-ci * cont : adopt variable names and comment from the other branch
b6067
chat : fix multiple tool_calls on hermes-2-pro (#14962)
b6066
vulkan: coopmat2 mul_mat optimizations (#14934) - Increase tile size for k-quants, to match non-k-quants - Choose more carefully between large and medium tiles, considering how it interacts with split_k - Allow larger/non-power of two split_k, and make the splits a multiple of 256 - Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used
b6065
llama-bench: rename DB table name from test to llama_bench (#15003) Signed-off-by: Xiaodong Ye <[email protected]>