Sync master with upstream release b6189 #207

jan-service-account · 2025-08-18T00:13:25Z

Updates dev branch with latest release (b6189) from ggml-org/llama.cpp

- Launch an appropriate number of invocations (next larger power of two). 32 invocations is common and the barrier is much cheaper there. - Specialize for "needs bounds checking" vs not. - Make the code less branchy and [[unroll]] the loops. In the final code, I see no branches inside the main loop (only predicated stores) when needs_bounds_check is false. - Always sort ascending, then apply the ascending vs descending option when doing the final stores to memory. - Copy the values into shared memory, makes them slightly cheaper to access.

* fix hang in windows-latest-cmake-hip * apply fix to release as well

ggml-org#15367) * force patch_embd weights to f32 * use MmprojModel base tensor_force_quant instead

…rg#15355) * vulkan: Use larger workgroups for mul_mat_vec when M is small Also use subgroup instructions for (part of) the reduction when supported. Without this, the more expensive reductions would eat into the benefits of the larger workgroups. * update heuristic for amd/intel Co-authored-by: 0cc4m <[email protected]> --------- Co-authored-by: 0cc4m <[email protected]>

Add tracking for high watermark cache usage and make it available in /metrics endpoint. Use-case: Tracking largest needed cache usage under realistic workload to better understand memory requirements and be able to adjust cache size/quantization for model/cache accordingly.

jeffbolznv and others added 6 commits August 17, 2025 10:41

ci : fix hang in windows-hip build/release (ggml-org#15365)

b143fbc

* fix hang in windows-latest-cmake-hip * apply fix to release as well

convert : force patch_embd weights to F16 or F32 to avoid broken GGUFs (

4d19698

ggml-org#15367) * force patch_embd weights to f32 * use MmprojModel base tensor_force_quant instead

vulkan: support sqrt (ggml-org#15370)

19f4dec

jan-service-account merged commit 4b8975c into dev Aug 18, 2025
17 checks passed

jan-service-account deleted the update-dev-from-master-2025-08-18-00-13 branch August 18, 2025 00:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6189 #207

Sync master with upstream release b6189 #207

Uh oh!

jan-service-account commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Sync master with upstream release b6189 #207

Sync master with upstream release b6189 #207

Uh oh!

Conversation

jan-service-account commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants