Sync master with upstream release b6075 #192

jan-service-account · 2025-08-03T00:14:34Z

Updates dev branch with latest release (b6075) from ggml-org/llama.cpp

…15003) Signed-off-by: Xiaodong Ye <[email protected]>

- Increase tile size for k-quants, to match non-k-quants - Choose more carefully between large and medium tiles, considering how it interacts with split_k - Allow larger/non-power of two split_k, and make the splits a multiple of 256 - Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used

* torch is not required for convert_hf_to_gguf_update * add --check-missing parameter * check that pre-tokenizer hashes are up-to-date

) * cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 ggml-ci * cont : fix cont types ggml-ci * cont : adopt variable names and comment from the other branch

ggml-ci

…ml-org#15040) This commit removes the right alignment the `n_stream` value in the log message in the `llama_kv_cache_unified` constructor. The motivation for this change is to enhance the readability of log message. Currently the output looks like this: ```console llama_kv_cache_unified: size = 2048.00 MiB ( 4096 cells, 32 layers, 1/ 1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB ``` Notice that the `n_stream` value is right aligned, which makes it a little harder to read. With the change in this commit the output will look like ```console llama_kv_cache_unified: size = 2048.00 MiB ( 4096 cells, 32 layers, 1/1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB ```

iamlemec and others added 13 commits August 2, 2025 10:44

model : support Qwen3-Embedding (ggml-org#15023)

339bd02

vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (ggml-org…

ec0b188

…#15015)

llama-bench: rename DB table name from test to llama_bench (ggml-org#…

3025b62

…15003) Signed-off-by: Xiaodong Ye <[email protected]>

chat : fix multiple tool_calls on hermes-2-pro (ggml-org#14962)

f738989

convert : fix Qwen3-Embedding pre-tokenizer hash (ggml-org#15030)

711d5e6

ci : check that pre-tokenizer hashes are up-to-date (ggml-org#15032)

2bf3fbf

* torch is not required for convert_hf_to_gguf_update * add --check-missing parameter * check that pre-tokenizer hashes are up-to-date

cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (ggml-org#15038

15e92fd

) * cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 ggml-ci * cont : fix cont types ggml-ci * cont : adopt variable names and comment from the other branch

llama : enable LLAMA_SET_ROWS=1 by default (ggml-org#14959)

a4569c4

ggml-ci

cuda: make im2col a little faster (ggml-org#15025)

3303c19

CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (ggml-org#15035)

03d4698

opencl: fix adreno compiler detection logic (ggml-org#15029)

5c0eb5e

jan-service-account merged commit a62f7d3 into dev Aug 3, 2025
17 checks passed

jan-service-account deleted the update-dev-from-master-2025-08-03-00-14 branch August 3, 2025 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6075 #192

Sync master with upstream release b6075 #192

Uh oh!

jan-service-account commented Aug 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Sync master with upstream release b6075 #192

Sync master with upstream release b6075 #192

Uh oh!

Conversation

jan-service-account commented Aug 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants