Eval bug: Wrong embeddings calculated

### Name and Version

llama-server --version
version: 6690 (86df2c9ae)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

llama-server -m embeddinggemma-300M-qat-Q4_0.gguf \
    --embeddings -ub 2048 -np 32 -c 100000

./llembstr.sh "apple"|head -c 80
[{"index":0,"embedding":[[0.04573212191462517,0.06408306211233139,0.010944360867 ...
... ,-0.0604519359767437,0.005052336025983095,0.0017678646836429834]]}]

with sentence-transformers and TEI, it gives the correct embedding:

[-0.18476315  0.00167681  0.03773482 ... -0.07996223 -0.02348067  0.00976739]

Test Results:

  - Test strings: "apple", "banana", "car"
  - Cosine similarities: 0.022652, -0.016087, 0.026008

One strange thing is the server mentions loading a default chat template which seems strange for an embedding server. Not sure if that impacts or is a red herring.

### Operating systems

Linux

### GGML backends

CPU

### Hardware

5800X

### Models

embeddinggemma-300M-qat-Q4_0.gguf

### Problem description & steps to reproduce

Embeddings produced bear no similarity to correct embeddings with sentence-transformer.

### First Bad Commit

_No response_

### Relevant log output

```shell
main: model loaded
main: chat template, chat_template: {%- for message in messages -%}
  {{- '<|im_start|>' + message.role + '
' + message.content + '<|im_end|>
' -}}
{%- endfor -%}
{%- if add_generation_prompt -%}
  {{- '<|im_start|>assistant
' -}}
{%- endif -%}, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
slot get_availabl: id 31 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 31 | task 0 | processing task
slot update_slots: id 31 | task 0 | new prompt, n_ctx_slot = 3328, n_keep = 0, n_prompt_tokens = 3
slot update_slots: id 31 | task 0 | n_past = 0, memory_seq_rm [0, end)
slot update_slots: id 31 | task 0 | prompt processing progress, n_past = 3, n_tokens = 3, progress = 1.000000
slot update_slots: id 31 | task 0 | prompt done, n_past = 3, n_tokens = 3
slot      release: id 31 | task 0 | stop processing: n_past = 3, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /embeddings 127.0.0.1 200
slot get_availabl: id 31 | task 0 | selected slot by lcs similarity, lcs_len = 3, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id 31 | task 2 | processing task
slot update_slots: id 31 | task 2 | new prompt, n_ctx_slot = 3328, n_keep = 0, n_prompt_tokens = 3
slot update_slots: id 31 | task 2 | n_past = 0, memory_seq_rm [0, end)
slot update_slots: id 31 | task 2 | prompt processing progress, n_past = 3, n_tokens = 3, progress = 1.000000
slot update_slots: id 31 | task 2 | prompt done, n_past = 3, n_tokens = 3
slot      release: id 31 | task 2 | stop processing: n_past = 3, truncated = 0
srv  update_slots: all slots are idle
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Wrong embeddings calculated #16538

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Wrong embeddings calculated #16538

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions