-
Notifications
You must be signed in to change notification settings - Fork 93
Mac
Georgi Gerganov edited this page May 14, 2025
·
7 revisions
Prerequisites - Homebrew
- Install llama.cpp with the command
brew install llama.cpp - Download the LLM model and run llama.cpp server (combined in one command)
- If you have more than 16GB VRAM:
llama-server \
--hf-repo ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \
--hf-file qwen2.5-coder-7b-q8_0.gguf \
--port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256- If you have less than 16GB VRAM:
llama-server --hf-repo ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF --hf-file qwen2.5-coder-1.5b-q8_0.gguf --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256
If the file is not available (first time) it will be downloaded (this could take some time) and after that llama.cpp server will be started.