Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b6276) from ggml-org/llama.cpp

danbev and others added 13 commits August 26, 2025 08:44
This commit removes the content from the Makefile and updates the
current deprecation message to information that `make` has been
replaced by CMake instead.

The message when `make` is invoked will now be the following:
```console
$ make
Makefile:6: *** Build system changed:
 The Makefile build has been replaced by CMake.

 For build instructions see:
 https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

.  Stop.
```

The motivation for this is that many, if not all targets fail to build
now, after changes to the system, and `make` has also been deprected for
some time now.
* support interns1-mini

* fix comment

* update
…5562)

* batched-bench : fix unified KV cache handling + pp timing

* cont : run dummy token only with split KV cache
…ml-org#15557)

* model-conversion: add model card template for embeddings [no ci]

This commit adds a separate model card template (model repository
README.md template) for embedding models.

The motivation for this is that there server command for the embedding
model is a little different and some addition information can be useful
in the model card for embedding models which might not be directly
relevant for causal models.

* squash! model-conversion: add model card template for embeddings [no ci]

Fix pyright lint error.

* remove --pooling override and clarify embd_normalize usage
…5564)

This commit explicitly sets the pooling type to 'none' in the logits.cpp
to support models that have a pooling type specified.

The motivation for this is that some models may have a pooling type set
in the model file (.gguf file) and for this specific case where we only
want to extract logits, we need to ensure that no pooling is used to
so that we are comparing raw logits and not pooled embeddings.
* CUDA: MoE helper in device code, better tile sizes

* reduce superfluous CUDA blocks
This avoids backend-dependent behavior for argmax that leads to intermittent failures.
@Minh141120 Minh141120 force-pushed the update-dev-from-master-2025-08-26-00-11 branch from f7207b0 to 4bd0e50 Compare August 26, 2025 01:46
@Minh141120 Minh141120 closed this Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.