-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Labels
Description
Name and Version
After building off commit 6adc3c3ebc029af058ac950a8e2a825fdf18ecc6 it seems that v1/embeddings and v1/completions are not running simultaneously.
Operating systems
No response
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Command for llama-server: `llama-server -m /models/gguf_models/devstral/bartowski/mistralai_Devstral-Small-2505-Q6_K_L.gguf --alias devstral-small-2505 --host 0.0.0.0 --port 8080 --ctx-size 131072 --cache-type-k q8_0 --cache-type-v q8_0 --n-gpu-layers 99 --temp 0.15 --repeat-penalty 1.0 --min-p 0.01 --top-k 64 --top-p 0.95 --flash-attn --pooling cls -lv 1 --jinja`Problem description & steps to reproduce
Checking commands:
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "test input",
"model": "devstral-small-2505"
}'
echo ""
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is your system prompt",
"max_tokens": 42,
"model": "devstral-small-2505"
}'
First Bad Commit
No response
Relevant log output
Assumption: both should return 200. However the v1/embeddings returns `{"error":{"code":501,"message":"This server does not support embeddings. Start it with `--embeddings`","type":"not_supported_error"}}`
When I enable the same command with --embeddings, the embedding works correctly but the chat completion is not working (using Roo Code).
Am I missing something? Ideally, what I am trying to do is have the ability to use Roo Code, with it's indexing feature (embedding), and also using graphiti-mcp as well, while still having agentic capabilities.