Name and Version
version: 5713 (4c9fdfb)
built with clang version 18.1.8 for x86_64-pc-windows-msvc
Operating systems
Windows
GGML backends
CUDA
Hardware
CPU Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
GPU NVIDIA Quadro RTX 5000 with Max-Q Design
Models
bge-m3
Problem description & steps to reproduce
The embedding results are very different between commit b4712 and b4713.
Server command used:
.\llama-server.exe --hf-repo gpustack/bge-m3-GGUF --hf-file bge-m3-Q4_K_M.gguf --embedding -ngl 99
POST request:
curl.exe -d "{\"input\": \"Hello\"}" http://127.0.0.1:8080/v1/embeddings
Please let me know if this behavior is expected or if there was a change in the embedding logic between these versions.
First Bad Commit
#14217
Relevant log output