Eval bug: Embedding output differs significantly between b4712 and b4713

### Name and Version

version: 5713 (4c9fdfbe)
built with clang version 18.1.8 for x86_64-pc-windows-msvc

### Operating systems

Windows

### GGML backends

CUDA

### Hardware

CPU   Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
GPU  NVIDIA Quadro RTX 5000 with Max-Q Design

### Models

bge-m3

### Problem description & steps to reproduce

The embedding results are very different between commit b4712 and b4713.

Server command used:

```powershell
.\llama-server.exe --hf-repo gpustack/bge-m3-GGUF --hf-file bge-m3-Q4_K_M.gguf --embedding -ngl 99
```

POST request:
```powershell
curl.exe -d "{\"input\": \"Hello\"}" http://127.0.0.1:8080/v1/embeddings
```

Please let me know if this behavior is expected or if there was a change in the embedding logic between these versions.

### First Bad Commit

https://github.com/ggml-org/llama.cpp/pull/14217


### Relevant log output

```shell
no relevant log
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Embedding output differs significantly between b4712 and b4713 #14848

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Embedding output differs significantly between b4712 and b4713 #14848

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions