Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b6039) from ggml-org/llama.cpp

bachelor-dou and others added 12 commits July 30, 2025 08:39
* CANN:add ops docs

* CANN: update ops docs
* embeddings: fix extraction of CLS pooling results

* merge RANK pooling into CLS case for inputs
* test-thread-safety : each context uses a single sequence

* embedding : handle --parallel argument

ggml-ci

* save-load : handle -np 1

ggml-ci

* thread-safety : avoid overriding threads, reduce test case arg

ggml-ci
The pipeline member can be cast to VkPipeline.
This is a VkPipeline_T* on 64 bit but a uint64_t on 32 bit.
Cf. VK_DEFINE_NON_DISPATCHABLE_HANDLE documentation.
ggml-ci
This commit adds support for the `embd_normalize` parameter in the
server code.

The motivation for this is that currently if the server is started with
a pooling type that is not `none`, then Euclidean/L2 normalization will
be the normalization method used for embeddings. However, this is not
always the desired behavior, and users may want to use other
normalization (or none) and this commit allows that.

Example usage:
```console
curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello world today", "embd_normalize": -1}
```
@jan-service-account jan-service-account merged commit 7c3fd44 into dev Jul 31, 2025
11 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-07-31-00-12 branch July 31, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.