Skip to content

Windows

igardev edited this page May 14, 2025 · 16 revisions

Setup llama.cpp servers for Windows

  1. Download file qwen2.5-coder-1.5b-q8_0.gguf from https://huggingface.co/ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF/blob/main/qwen2.5-coder-1.5b-q8_0.gguf
  2. Download the release files for Windows from https://github.com/ggerganov/llama.cpp/releases and extract them.
  3. Run llama.cpp server
    3.1 No GPUs
    In the extracted files folder put the model qwen2.5-coder-1.5b-q8_0.gguf and start llama.cpp server from command window:
`llama-server.exe --fim-qwen-1.5b-default`  

3.2 With Nvidia GPUs and installed latest cuda
In the extracted files folder put the model qwen2.5-coder-1.5b-q8_0.gguf and start llama.cpp server from command window:

`llama-server.exe --fim-qwen-1.5b-default -ngl 99 `  

Now you could start using llama-vscode extension.

More details about llama.cpp server

Clone this wiki locally