Windows

Setup llama.cpp servers for Windows

Code completion server

Used for
- code completion

LLM type
- FIM (fill in the middle)

Instructions

Install llama.cpp

winget install llama.cpp

OR

Download the release files for Windows for llama.cpp from releases. For CPU use llama--bin-win-cpu-.zip. For Nvidia: llama--bin-win-cuda-x64.zip and if you don't have cuda drivers installed also cudart-llama-bin-win-cuda*-x64.zip.

Run llama.cpp server

No GPUs

`llama-server.exe --fim-qwen-1.5b-default --port 8012`

With Nvidia GPUs and installed latest cuda

`llama-server.exe --fim-qwen-1.5b-default --port 8012 -ngl 99`

If you've installed llama.cpp with winget you could skip the .exe suffix and use just llama-server in the commands.

Now you could start using llama-vscode extension for code completion.

More details about llama.cpp server

Chat server

Used for
- Chat with AI
- Chat with AI with project context
- Edit with AI
- Generate commit message

LLM type
- Chat Models

Instructions
Same like code completion server, but use chat model and a little bit different parameters.

CPU-only:

`llama-server.exe -hf qwen2.5-coder-1.5b-instruct-q8_0.gguf --port 8011`

With Nvidia GPUs and installed cuda drivers

more than 16GB VRAM

`llama-server.exe -hf qwen2.5-coder-7b-instruct-q8_0.gguf --port 8011 -np 2 -ngl 99`

less than 16GB VRAM

`llama-server.exe -hf qwen2.5-coder-3b-instruct-q8_0.gguf --port 8011 -np 2 -ngl 99`

less than 8GB VRAM

`llama-server.exe -hf qwen2.5-coder-1.5b-instruct-q8_0.gguf --port 8011 -np 2 -ngl 99`

Embeddings server

Used for
- Chat with AI with project context

LLM type
- Embedding

Instructions
Same like code completion server, but use embeddings model and a little bit different parameters.

`llama-server.exe -hf nomic-embed-text-v2-moe-q8_0.gguf --port 8010 -ub 2048 -b 2048 --ctx-size 2048 --embeddings`

Windows

Setup llama.cpp servers for Windows

Code completion server

Install llama.cpp

Run llama.cpp server

Chat server

Embeddings server

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally