Skip to content

Releases: ggml-org/llama.cpp

b6078

03 Aug 20:28
97366dc
Compare
Choose a tag to compare
vocab : JetBrains Mellum pre-tokenizer (#15045)

b6076

03 Aug 12:42
6c7a441
Compare
Choose a tag to compare
vulkan: Use coopmat2 for conv2d (#14982)

b6075

02 Aug 18:11
5c0eb5e
Compare
Choose a tag to compare
opencl: fix adreno compiler detection logic (#15029)

b6074

02 Aug 15:29
03d4698
Compare
Choose a tag to compare
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035)

b6073

02 Aug 15:27
3303c19
Compare
Choose a tag to compare
cuda: make im2col a little faster (#15025)

b6071

02 Aug 15:18
a4569c4
Compare
Choose a tag to compare
llama : enable LLAMA_SET_ROWS=1 by default (#14959)

ggml-ci

b6070

02 Aug 15:04
15e92fd
Compare
Choose a tag to compare
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038)

* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1

ggml-ci

* cont : fix cont types

ggml-ci

* cont : adopt variable names and comment from the other branch

b6067

02 Aug 11:00
f738989
Compare
Choose a tag to compare
chat : fix multiple tool_calls on hermes-2-pro (#14962)

b6066

02 Aug 10:59
4cb208c
Compare
Choose a tag to compare
vulkan: coopmat2 mul_mat optimizations (#14934)

- Increase tile size for k-quants, to match non-k-quants
- Choose more carefully between large and medium tiles, considering how it
  interacts with split_k
- Allow larger/non-power of two split_k, and make the splits a multiple of 256
- Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used

b6065

02 Aug 10:15
3025b62
Compare
Choose a tag to compare
llama-bench: rename DB table name from test to llama_bench (#15003)

Signed-off-by: Xiaodong Ye <[email protected]>