Skip to content

Releases: ggml-org/llama.cpp

b6073

02 Aug 15:27
3303c19
Compare
Choose a tag to compare
cuda: make im2col a little faster (#15025)

b6071

02 Aug 15:18
a4569c4
Compare
Choose a tag to compare
llama : enable LLAMA_SET_ROWS=1 by default (#14959)

ggml-ci

b6070

02 Aug 15:04
15e92fd
Compare
Choose a tag to compare
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038)

* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1

ggml-ci

* cont : fix cont types

ggml-ci

* cont : adopt variable names and comment from the other branch

b6067

02 Aug 11:00
f738989
Compare
Choose a tag to compare
chat : fix multiple tool_calls on hermes-2-pro (#14962)

b6066

02 Aug 10:59
4cb208c
Compare
Choose a tag to compare
vulkan: coopmat2 mul_mat optimizations (#14934)

- Increase tile size for k-quants, to match non-k-quants
- Choose more carefully between large and medium tiles, considering how it
  interacts with split_k
- Allow larger/non-power of two split_k, and make the splits a multiple of 256
- Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used

b6065

02 Aug 10:15
3025b62
Compare
Choose a tag to compare
llama-bench: rename DB table name from test to llama_bench (#15003)

Signed-off-by: Xiaodong Ye <[email protected]>

b6064

02 Aug 09:27
ec0b188
Compare
Choose a tag to compare
vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015)

b6063

02 Aug 09:19
339bd02
Compare
Choose a tag to compare
model : support Qwen3-Embedding (#15023)

b6062

02 Aug 08:38
f906275
Compare
Choose a tag to compare
server: enable token array inputs for OAI API (#15001)

b6061

02 Aug 08:31
a9f7541
Compare
Choose a tag to compare
vulkan: optimizations for direct convolution (#14933)

* vulkan: optimizations for direct convolution

- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.

* Three tiles sizes for CONV_2D, and a heuristic to choose

* reallow collectives for pre-Turing

* make SHMEM_PAD a spec constant

* fixes for intel perf - no shmem padding, placeholder shader core count

* shader variants with/without unrolling

* 0cc4m's fixes for AMD perf

Co-authored-by: 0cc4m <[email protected]>

---------

Co-authored-by: 0cc4m <[email protected]>