Releases · ggml-org/llama.cpp

02 Aug 15:27

3303c19

b6073

cuda: make im2col a little faster (#15025)

Assets 15

02 Aug 15:18

github-actions

b6071

a4569c4

b6071

llama : enable LLAMA_SET_ROWS=1 by default (#14959)

ggml-ci

Assets 15

02 Aug 15:04

github-actions

b6070

15e92fd

b6070

cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (#15038)

* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1

ggml-ci

* cont : fix cont types

ggml-ci

* cont : adopt variable names and comment from the other branch

Assets 15

02 Aug 11:00

github-actions

b6067

f738989

b6067

chat : fix multiple tool_calls on hermes-2-pro (#14962)

Assets 15

02 Aug 10:59

github-actions

b6066

4cb208c

b6066

vulkan: coopmat2 mul_mat optimizations (#14934)

- Increase tile size for k-quants, to match non-k-quants
- Choose more carefully between large and medium tiles, considering how it
  interacts with split_k
- Allow larger/non-power of two split_k, and make the splits a multiple of 256
- Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used

Assets 15

02 Aug 10:15

github-actions

b6065

3025b62

b6065

llama-bench: rename DB table name from test to llama_bench (#15003)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 15

02 Aug 09:27

github-actions

b6064

ec0b188

b6064

vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015)

Assets 15

02 Aug 09:19

github-actions

b6063

339bd02

b6063

model : support Qwen3-Embedding (#15023)

Assets 15

02 Aug 08:38

github-actions

b6062

f906275

b6062

server: enable token array inputs for OAI API (#15001)

Assets 15

02 Aug 08:31

github-actions

b6061

a9f7541

b6061

vulkan: optimizations for direct convolution (#14933)

* vulkan: optimizations for direct convolution

- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.

* Three tiles sizes for CONV_2D, and a heuristic to choose

* reallow collectives for pre-Turing

* make SHMEM_PAD a spec constant

* fixes for intel perf - no shmem padding, placeholder shader core count

* shader variants with/without unrolling

* 0cc4m's fixes for AMD perf

Co-authored-by: 0cc4m <[email protected]>

---------

Co-authored-by: 0cc4m <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6073

Uh oh!

b6071

Uh oh!

b6070

Uh oh!

b6067

Uh oh!

b6066

Uh oh!

b6065

Uh oh!

b6064

Uh oh!

b6063

Uh oh!

b6062

Uh oh!

b6061

Uh oh!