-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Khoj is reverting to downloading a model even if I add my own llama-cpp hosted server #1253
Description
Server
- Cloud (https://app.khoj.dev)
- Self-Hosted Docker
- Self-Hosted Python package
- Self-Hosted source code
Clients
- Web browser
- Desktop/mobile app
- Obsidian
- Emacs
OS
- Windows
- macOS
- Linux
- Android
- iOS
Khoj version
latest
Describe the bug
Khoj tries to download a model from Hugginface with this local configuration, instead of using the local API.
Current Behavior
This is the bielik configuration:
I used admin API key because I had no clue what to put here, since it is a local self-hosted llama-cpp-server.
I confirmed it works by doing:
curl -X POST http://localhost:8080/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "Napisz puuuuu",
"max_tokens": 32,
"temperature": 0.2
}'
{"index":0,"content":"uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu","tokens":[],"id_slot":3,"stop":true,"model":"Bielik-4.5B-v3.0-Instruct.Q8_0.gguf","tokens_predicted":32,"tokens_evaluated":8,"generation_settings":{"seed":4294967295,"temperature":0.20000000298023224,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"top_n_sigma":-1.0,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":32,"n_predict":32,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","grammar_lazy":false,"grammar_triggers":[],"preserved_tokens":[],"chat_format":"Content-only","reasoning_format":"deepseek","reasoning_in_content":false,"thinking_forced_open":false,"samplers":["penalties","dry","top_n_sigma","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":0,"speculative.p_min":0.75,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s>Napisz puuuuu","has_new_line":false,"truncated":false,"stop_type":"limit","stopping_word":"","tokens_cached":39,"timings":{"cache_n":1,"prompt_n":7,"prompt_ms":275.595,"prompt_per_token_ms":39.37071428571429,"prompt_per_second":25.399589978047494,"predicted_n":32,"predicted_ms":5506.741,"predicted_per_token_ms":172.08565625,"predicted_per_second":5.8110595722588005}}%
Yet Khoj tries to download the online model from HF as evident from the logs:
ValueError: No file found in
[server] | speakleash/Bielik-4.5B-v3.0-Instruct-
[server] | GGUF that match *Q4_K_M.gguf
If I save the same local model with a different name, without the slash /, it also doesn't work with a different error.
ValueError: not enough values to
[server] | unpack (expected 2, got 1)
I tried again with this name, since it is a perfect match: speakleash/Bielik-4.5B-v3.0-Instruct.Q8_0.gguf
FileNotFoundError:
[server] | speakleash/Bielik-4.5B-v3.0-Instruct.
[server] | Q8_0.gguf (repository not found)
Expected Behavior
Khoj should use the localhost:8080 as API instead of downloading the HF model.
I followed the manual very closely:
Install any Openai API compatible local ai model server like llama-cpp-server, Ollama, vLLM etc.
Add an ai model api on the admin panel
Set the api url field to the url of your local ai model provider like http://localhost:11434/v1/ for Ollama
Restart the Khoj server to load models available on your local ai model provider
If that doesn't work, you'll need to manually add available chat model in the admin panel.
Set the newly added chat model as your preferred model in your User chat settings
Start chatting with your local AI!
Reproduction Steps
Install on arch using podman compose up. Run a separate llama-cpp-server instance (I installed this through AUR) using model https://huggingface.co/speakleash/Bielik-4.5B-v3.0-Instruct-GGUF - llama-cpp-server -m local_path_model.gguf. Copy the open port and add the model to the Khoj configuration.
Possible Workaround
No response
Additional Information
No response
Link to Discord or Github discussion
No response