Mac

Setup llama.cpp servers for Mac

Prerequisites - Homebrew

Install llama.cpp with the command
brew install llama.cpp
Download the LLM model and run llama.cpp server (combined in one command)

If you have more than 16GB VRAM:

llama-server \
  --hf-repo ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \
  --hf-file qwen2.5-coder-7b-q8_0.gguf \
  --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256

If you have less than 16GB VRAM:
llama-server --hf-repo ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF --hf-file qwen2.5-coder-1.5b-q8_0.gguf --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256
If the file is not available (first time) it will be downloaded (this could take some time) and after that llama.cpp server will be started.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mac

Setup llama.cpp servers for Mac

Prerequisites - Homebrew

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally