A method for code indexing using the local model qwen3-embedding-8b #1411

planb788 · 2025-07-19T22:55:02Z

planb788
Jul 19, 2025

Because the official does not support the ollama version of qwen3-embedding-8b, I used AI to write code that uses a gguf model, simulating an OpenAI compatible API. This is a compressed package containing the code file and related dependency files. Download the compressed package to your local machine, extract it, and modify the GGUF_MODEL_PATH in openai_embedding_api.py to point to your local Qwen3-Embedding-8B-Q8_0.gguf file (i.e., fill in the absolute path of this file; you need to download the model yourself from the qwen official model repository). Then, change "your-secret-key" to your own key and save the changes. Next, open a terminal in the current directory of the code, use pip install -r requirements.txt to install the required dependencies, and then use python openai_embedding_api.py to start the service. You can then use the vector model. In kilo code, click the cylindrical icon shaped like three stacked pies at the bottom, set the base_url to http://127.0.0.1:8000/v1, fill in 4096 for the model dimension, enter qwen3-embedding-8b for the model name, and fill in the key you just set. Click save, and then click start indexing. I am not responsible for guiding the usage of qdrant.
qwensimulateopenai.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A method for code indexing using the local model qwen3-embedding-8b #1411

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

A method for code indexing using the local model qwen3-embedding-8b #1411

Uh oh!

planb788 Jul 19, 2025

Replies: 0 comments

planb788
Jul 19, 2025