readme : update llama-server command with presets

danbev · danbev · commit af409d6ab3af · 2025-02-19T11:42:38.000+01:00
This commit updates the llama-server commands to use the new presets available in the latest version of llama.cpp. Refs: ggml-org/llama.cpp#11945
diff --git a/README.md b/README.md
@@ -100,28 +100,19 @@ Here are recommended settings, depending on the amount of VRAM that you have:
 - More than 16GB VRAM:
 
   ```bash
-  llama-server \
-      -hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \
-      --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
-      --ctx-size 0 --cache-reuse 256
+  llama-server --fim-qwen-7b-default
   ```
 
 - Less than 16GB VRAM:
 
   ```bash
-  llama-server \
-      -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF \
-      --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
-      --ctx-size 0 --cache-reuse 256
+  llama-server --fim-qwen-3b-default
   ```
 
 - Less than 8GB VRAM:
 
   ```bash
-  llama-server \
-      -hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF \
-      --port 8012 -ngl 99 -fa -ub 1024 -b 1024 \
-      --ctx-size 0 --cache-reuse 256
+  llama-server --fim-qwen-1.5b-default
   ```
 
 Use `:help llama` for more details.