Windows

Setup llama.cpp servers for Windows

Download file qwen2.5-coder-1.5b-q8_0.gguf from https://huggingface.co/ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF/blob/main/qwen2.5-coder-1.5b-q8_0.gguf
Download the release files for Windows from https://github.com/ggerganov/llama.cpp/releases and extract them.
Run llama.cpp server
3.1 No GPUs
In the extracted files folder put the model qwen2.5-coder-1.5b-q8_0.gguf and start llama.cpp server from command window:

`llama-server.exe --fim-qwen-1.5b-default`

3.2 With Nvidia GPUs and installed latest cuda
In the extracted files folder put the model qwen2.5-coder-1.5b-q8_0.gguf and start llama.cpp server from command window:

`llama-server.exe --fim-qwen-1.5b-default -ngl 99 `

Now you could start using llama-vscode extension.

More details about llama.cpp server

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Windows

Setup llama.cpp servers for Windows

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally