Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b6706) from ggml-org/llama.cpp

ggerganov and others added 6 commits October 7, 2025 08:21
* metal : ssm_scan minor opts

* metal : get_rows optimize

* metal : cpy optimize

* metal : ssm_conv opt

* metal : ssm_scan simplify

* metal : ssm_Scan opt
* tests : add -INF blocks to the KQ mask in the FA tests

* cont : bump -INF block size to 64

Co-authored-by: Jeff Bolz <[email protected]>

* ggml : prevent division by zero in FA CPU op

---------

Co-authored-by: Jeff Bolz <[email protected]>
* metal : pad K, V and Mask when needed

* cont : simplify

* cuda : add TODO about KV padding requirement

* metal : add comments

* metal : remove mask padding requirement
Update the README file to match the newly added functionality of
exposing multiple devices from a single server.

Co-authored-by: Diego Devesa <[email protected]>
@Minh141120 Minh141120 closed this Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants