Sync master with upstream release b6724 #284

jan-service-account · 2025-10-10T00:33:13Z

Updates dev branch with latest release (b6724) from ggml-org/llama.cpp

…odules (ggml-org#16367) * model: EmbeddingGemma sentence-transformers dense linear projections support * model: add support for EmbeddingGemma SentenceTransformers dense linear projections Adding support for the Dense modules used in EmbeddingGemma models. EmbeddingGemma is a SentenceTransformers model with additional modules beyond the base Transformer backbone. See: https://developers.googleblog.com/en/gemma-explained-embeddinggemma-architecture-and-recipe/ * model: add support for EmbeddingGemma SentenceTransformers dense linear projections - converting model with dense-layers is optional - introduced dense config params * Update convert_hf_to_gguf.py Co-authored-by: Daniel Bevenius <[email protected]> * fixed formatting issues * Update src/llama-graph.cpp Co-authored-by: Georgi Gerganov <[email protected]> * - removed pooling_type_opt, always allow overriding pooling_type - asserts checking dense features dims * fix python lint * fix ubuntu gcc build warning * - fixed thread-safety test - moved asserts to load_hparams * - tidying up code - simplifying graph-context expecting both dense weights * minor : add TODO --------- Co-authored-by: Daniel Bevenius <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* refactor to support soft_max_ext * fix error and support soft_max_back * rm unused functions * fix format issue --------- Co-authored-by: Zhang Jianyu <[email protected]>

* CANN: improve ACL graph matching Record `ne` and `nb` information for src tensors and include them in the graph matching check. This enhances the robustness of ACL graph matching by preventing incorrect matches when src tensors share the same data address but differ in shape or stride. * CANN: add op_params match

* model-conversion : add support for SentenceTransformers This commit adds support for models that use SentenceTransformer layers. The motivation for this is that if converted model includes any of the numbered layers specified in the original models repository then these changes enable these models to be used and verified. Currently the model-conversion only support the base model output without any of the additional transformation layers. Usage: Convert the model that also includes the SentenceTransformer layers: ```console (venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M" (venv) make embedding-convert-model ``` Verify the produced embeddings from the converted model against the original model embeddings: ```console (venv) make embedding-verify-logits-st ``` The original model can be run using SentenceTransformer: ```console (venv) make embedding-run-original-model-st ``` Run the converted model using "SentenceTransformer" layers whic enables pooling and normalization: ```console (venv) make embedding-run-converted-model-st ``` * add model-conversion example requirements * add support for -st flag in embedding model conversion This commit add support for the -st flag in the embedding model conversion script. This will enable models to be converted using sentence transformers dense layers.

* fix: let the model think in plaintext * chore: npm run format + npm run build

* minor : code style * server : fix prompt similarity calculation * server : initial host-memory prompt caching * cont * server : refactor * cont * cont : make the server task of the slot const * cont : minor [no ci] * server : cache prompts and checkpoints only for completion tasks * server : improve prompt caching logic * cont : fix check for number of cached prompts [no ci] * server : improve caching logic, add -cram CLI arg * server : print prompt mismatch info * cont : better naming [no ci] * server : improve prompt cache loading logic * server : add option to debug the slot contents (ggml-org#16482) * server : add option to debug the slot contents * Update tools/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <[email protected]> * server : add option to disable prompt cache --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>

@taronaeo

* ggml-cpu: optimize norm operation to use intrinsics or Accelerate rename function add endif macro comment Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Aaron Teo <[email protected]> * implement s390x SIMD suggested by @taronaeo * add TODO comment * tidy up spaces --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Aaron Teo <[email protected]>

sfallah and others added 9 commits October 9, 2025 09:39

[SYCL] refactor soft_max, add soft_max_back (ggml-org#16472)

b260213

* refactor to support soft_max_ext * fix error and support soft_max_back * rm unused functions * fix format issue --------- Co-authored-by: Zhang Jianyu <[email protected]>

kleidiai: kernel interface refactoring (ggml-org#16460)

d80d6d2

ci: add ARM64 Kleidiai build and test support (ggml-org#16462)

2c0d875

No markdown in cot (ggml-org#16483)

8328fd4

* fix: let the model think in plaintext * chore: npm run format + npm run build

jan-service-account merged commit 7034082 into dev Oct 10, 2025
3 checks passed

jan-service-account deleted the update-dev-from-master-2025-10-10-00-33 branch October 10, 2025 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6724 #284

Sync master with upstream release b6724 #284

Uh oh!

jan-service-account commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Sync master with upstream release b6724 #284

Sync master with upstream release b6724 #284

Uh oh!

Conversation

jan-service-account commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants