Skip to content

Eval bug: Wrong embeddings calculated #16538

@cduk

Description

@cduk

Name and Version

llama-server --version
version: 6690 (86df2c9)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

llama-server -m embeddinggemma-300M-qat-Q4_0.gguf
--embeddings -ub 2048 -np 32 -c 100000

./llembstr.sh "apple"|head -c 80
[{"index":0,"embedding":[[0.04573212191462517,0.06408306211233139,0.010944360867 ...
... ,-0.0604519359767437,0.005052336025983095,0.0017678646836429834]]}]

with sentence-transformers and TEI, it gives the correct embedding:

[-0.18476315 0.00167681 0.03773482 ... -0.07996223 -0.02348067 0.00976739]

Test Results:

  • Test strings: "apple", "banana", "car"
  • Cosine similarities: 0.022652, -0.016087, 0.026008

One strange thing is the server mentions loading a default chat template which seems strange for an embedding server. Not sure if that impacts or is a red herring.

Operating systems

Linux

GGML backends

CPU

Hardware

5800X

Models

embeddinggemma-300M-qat-Q4_0.gguf

Problem description & steps to reproduce

Embeddings produced bear no similarity to correct embeddings with sentence-transformer.

First Bad Commit

No response

Relevant log output

main: model loaded
main: chat template, chat_template: {%- for message in messages -%}
  {{- '<|im_start|>' + message.role + '
' + message.content + '<|im_end|>
' -}}
{%- endfor -%}
{%- if add_generation_prompt -%}
  {{- '<|im_start|>assistant
' -}}
{%- endif -%}, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
slot get_availabl: id 31 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 31 | task 0 | processing task
slot update_slots: id 31 | task 0 | new prompt, n_ctx_slot = 3328, n_keep = 0, n_prompt_tokens = 3
slot update_slots: id 31 | task 0 | n_past = 0, memory_seq_rm [0, end)
slot update_slots: id 31 | task 0 | prompt processing progress, n_past = 3, n_tokens = 3, progress = 1.000000
slot update_slots: id 31 | task 0 | prompt done, n_past = 3, n_tokens = 3
slot      release: id 31 | task 0 | stop processing: n_past = 3, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /embeddings 127.0.0.1 200
slot get_availabl: id 31 | task 0 | selected slot by lcs similarity, lcs_len = 3, similarity = 1.000 (> 0.100 thold)
slot launch_slot_: id 31 | task 2 | processing task
slot update_slots: id 31 | task 2 | new prompt, n_ctx_slot = 3328, n_keep = 0, n_prompt_tokens = 3
slot update_slots: id 31 | task 2 | n_past = 0, memory_seq_rm [0, end)
slot update_slots: id 31 | task 2 | prompt processing progress, n_past = 3, n_tokens = 3, progress = 1.000000
slot update_slots: id 31 | task 2 | prompt done, n_past = 3, n_tokens = 3
slot      release: id 31 | task 2 | stop processing: n_past = 3, truncated = 0
srv  update_slots: all slots are idle

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions