-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Description
Name and Version
llama-server --version
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.022 sec
ggml_metal_device_init: GPU name: Apple M1
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 11453.25 MB
version: 6690 (86df2c9)
built with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
Operating systems
Linux, Mac
GGML backends
CPU
Hardware
M1
Models
bge-small-en-v1.5.gguf
Problem description & steps to reproduce
llama-server -m ./models/bge-small-en-v1.5-f16.gguf --host 0.0.0.0 --port 8081 --embedding --embd-bge-small-en-default
correct setting (working is as follows):
llama-server -m ./models/bge-small-en-v1.5-f16.gguf --host 0.0.0.0 --port 8081 --embedding -t 8 --embd-bge-small-en-default --pooling cls
(cls for this model, but pooling should not be left to none for any embedding model)
--embd-bge-small-en-default has very confusingly set pooling to None. So then when calling /v1/embeddings endpoint, it returns error:
{"error":{"code":4
00,"message":"Pooling type 'none' is not OAI compatible. Please use a different pooling type","type":"invalid_request_erro
r"}}
Have spent few hours today trying to understand why convenience setting would set pooling wrongly.
This is true of all such convenience settings for embedding models (pooling none)
First Bad Commit
No response
Relevant log output
{"error":{"code":4
00,"message":"Pooling type 'none' is not OAI compatible. Please use a different pooling type","type":"invalid_request_erro
r"}}