Skip to content
Georgi Gerganov edited this page May 14, 2025 · 7 revisions

Setup llama.cpp servers for Mac

Prerequisites - Homebrew

  1. Install llama.cpp with the command
    brew install llama.cpp
  2. Download the LLM model and run llama.cpp server (combined in one command)
  • If you have more than 16GB VRAM:
llama-server \
  --hf-repo ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \
  --hf-file qwen2.5-coder-7b-q8_0.gguf \
  --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256
  • If you have less than 16GB VRAM:
    llama-server --hf-repo ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF --hf-file qwen2.5-coder-1.5b-q8_0.gguf --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256
    If the file is not available (first time) it will be downloaded (this could take some time) and after that llama.cpp server will be started.

Clone this wiki locally