Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b6337) from ggml-org/llama.cpp

jeffbolznv and others added 10 commits August 31, 2025 08:27
…#15652)

* vulkan: clamp matmul and FA results to the max finite value

* only clamp for fp16
…#15649)

* vulkan: Allow fallback to sysmem memory when vidmem is full

* vulkan: Add env var GGML_VK_ALLOW_SYSMEM_FALLBACK
…#15679)

This commit removes the portability_enumeration_ext variable from the
ggml_vk_instance_portability_enumeration_ext_available function as it
is initialized to false but never modified, making it redundant.
* vulkan: mul_mat_id coopmat2 optimizations

Add a path for when the tile fits in BN/2, similar to what we have for mul_mat.

Only call fetch_scales/store_scales once per QUANT_K block, and once at the
beginning in case start_k is not aligned.

* Also add a path for BN/4 - worth a couple more percent
)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
* metal : fix checks for available FA kernels

ggml-ci

* cont : fix comment [no ci]
* server : enable /slots by default and make it secure

ggml-ci

* server : fix tests to pass `--no-slots` when necessary

* server : extend /props with info about enabled endpoints
@jan-service-account jan-service-account merged commit 3832f54 into dev Sep 1, 2025
13 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-09-01-00-42 branch September 1, 2025 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants