-
Notifications
You must be signed in to change notification settings - Fork 93
Windows
igardev edited this page May 14, 2025
·
16 revisions
- Download file qwen2.5-coder-1.5b-q8_0.gguf from https://huggingface.co/ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF/blob/main/qwen2.5-coder-1.5b-q8_0.gguf
- Download the release files for Windows from https://github.com/ggerganov/llama.cpp/releases and extract them.
- Run llama.cpp server
3.1 No GPUs
In the extracted files folder put the model qwen2.5-coder-1.5b-q8_0.gguf and start llama.cpp server from command window:
`llama-server.exe --fim-qwen-1.5b-default` 3.2 With Nvidia GPUs and installed latest cuda
In the extracted files folder put the model qwen2.5-coder-1.5b-q8_0.gguf and start llama.cpp server from command window:
`llama-server.exe --fim-qwen-1.5b-default -ngl 99 ` Now you could start using llama-vscode extension.