Skip to content

Khoj is reverting to downloading a model even if I add my own llama-cpp hosted server #1253

@DeutscheGabanna

Description

@DeutscheGabanna

Server

  • Cloud (https://app.khoj.dev)
  • Self-Hosted Docker
  • Self-Hosted Python package
  • Self-Hosted source code

Clients

  • Web browser
  • Desktop/mobile app
  • Obsidian
  • Emacs
  • WhatsApp

OS

  • Windows
  • macOS
  • Linux
  • Android
  • iOS

Khoj version

latest

Describe the bug

Khoj tries to download a model from Hugginface with this local configuration, instead of using the local API.

Current Behavior

Image

This is the bielik configuration:

Image

I used admin API key because I had no clue what to put here, since it is a local self-hosted llama-cpp-server.

I confirmed it works by doing:

curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Napisz puuuuu",
    "max_tokens": 32,
    "temperature": 0.2
  }'

{"index":0,"content":"uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu","tokens":[],"id_slot":3,"stop":true,"model":"Bielik-4.5B-v3.0-Instruct.Q8_0.gguf","tokens_predicted":32,"tokens_evaluated":8,"generation_settings":{"seed":4294967295,"temperature":0.20000000298023224,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"top_n_sigma":-1.0,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":32,"n_predict":32,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","grammar_lazy":false,"grammar_triggers":[],"preserved_tokens":[],"chat_format":"Content-only","reasoning_format":"deepseek","reasoning_in_content":false,"thinking_forced_open":false,"samplers":["penalties","dry","top_n_sigma","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":0,"speculative.p_min":0.75,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s>Napisz puuuuu","has_new_line":false,"truncated":false,"stop_type":"limit","stopping_word":"","tokens_cached":39,"timings":{"cache_n":1,"prompt_n":7,"prompt_ms":275.595,"prompt_per_token_ms":39.37071428571429,"prompt_per_second":25.399589978047494,"predicted_n":32,"predicted_ms":5506.741,"predicted_per_token_ms":172.08565625,"predicted_per_second":5.8110595722588005}}%   

Yet Khoj tries to download the online model from HF as evident from the logs:

ValueError: No file found in                         
[server]   |                            speakleash/Bielik-4.5B-v3.0-Instruct-                
[server]   |                            GGUF that match *Q4_K_M.gguf 

If I save the same local model with a different name, without the slash /, it also doesn't work with a different error.

ValueError: not enough values to                     
[server]   |                            unpack (expected 2, got 1)     

I tried again with this name, since it is a perfect match: speakleash/Bielik-4.5B-v3.0-Instruct.Q8_0.gguf

FileNotFoundError:                                   
[server]   |                            speakleash/Bielik-4.5B-v3.0-Instruct.                
[server]   |                            Q8_0.gguf (repository not found)  

Expected Behavior

Khoj should use the localhost:8080 as API instead of downloading the HF model.

I followed the manual very closely:

Install any Openai API compatible local ai model server like llama-cpp-server, Ollama, vLLM etc.
Add an ai model api on the admin panel
Set the api url field to the url of your local ai model provider like http://localhost:11434/v1/ for Ollama
Restart the Khoj server to load models available on your local ai model provider
If that doesn't work, you'll need to manually add available chat model in the admin panel.
Set the newly added chat model as your preferred model in your User chat settings
Start chatting with your local AI!

Reproduction Steps

Install on arch using podman compose up. Run a separate llama-cpp-server instance (I installed this through AUR) using model https://huggingface.co/speakleash/Bielik-4.5B-v3.0-Instruct-GGUF - llama-cpp-server -m local_path_model.gguf. Copy the open port and add the model to the Khoj configuration.

Possible Workaround

No response

Additional Information

No response

Link to Discord or Github discussion

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    fixFix something that isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions