You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+45Lines changed: 45 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -255,6 +255,51 @@ logging:
255
255
--log-append Don't truncate the old log file.
256
256
```
257
257
258
+
Available environment variables (if specified, these variables will override parameters specified in arguments):
259
+
260
+
-`LLAMA_CACHE`: cache directory, used by `--hf-repo`
261
+
-`HF_TOKEN`: Hugging Face access token, used when accessing a gated model with `--hf-repo`
262
+
-`LLAMA_ARG_MODEL`: equivalent to `-m`
263
+
-`LLAMA_ARG_MODEL_URL`: equivalent to `-mu`
264
+
-`LLAMA_ARG_MODEL_ALIAS`: equivalent to `-a`
265
+
-`LLAMA_ARG_HF_REPO`: equivalent to `--hf-repo`
266
+
-`LLAMA_ARG_HF_FILE`: equivalent to `--hf-file`
267
+
-`LLAMA_ARG_THREADS`: equivalent to `-t`
268
+
-`LLAMA_ARG_CTX_SIZE`: equivalent to `-c`
269
+
-`LLAMA_ARG_N_PARALLEL`: equivalent to `-np`
270
+
-`LLAMA_ARG_BATCH`: equivalent to `-b`
271
+
-`LLAMA_ARG_UBATCH`: equivalent to `-ub`
272
+
-`LLAMA_ARG_N_GPU_LAYERS`: equivalent to `-ngl`
273
+
-`LLAMA_ARG_THREADS_HTTP`: equivalent to `--threads-http`
274
+
-`LLAMA_ARG_CHAT_TEMPLATE`: equivalent to `--chat-template`
275
+
-`LLAMA_ARG_N_PREDICT`: equivalent to `-n`
276
+
-`LLAMA_ARG_ENDPOINT_METRICS`: if set to `1`, it will enable metrics endpoint (equivalent to `--metrics`)
277
+
-`LLAMA_ARG_ENDPOINT_SLOTS`: if set to `0`, it will **disable** slots endpoint (equivalent to `--no-slots`). This feature is enabled by default.
278
+
-`LLAMA_ARG_EMBEDDINGS`: if set to `1`, it will enable embeddings endpoint (equivalent to `--embeddings`)
279
+
-`LLAMA_ARG_FLASH_ATTN`: if set to `1`, it will enable flash attention (equivalent to `-fa`)
280
+
-`LLAMA_ARG_CONT_BATCHING`: if set to `0`, it will **disable** continuous batching (equivalent to `--no-cont-batching`). This feature is enabled by default.
281
+
-`LLAMA_ARG_DEFRAG_THOLD`: equivalent to `-dt`
282
+
-`LLAMA_ARG_HOST`: equivalent to `--host`
283
+
-`LLAMA_ARG_PORT`: equivalent to `--port`
284
+
285
+
Example usage of docker compose with environment variables:
286
+
287
+
```yml
288
+
services:
289
+
llamacpp-server:
290
+
image: ghcr.io/ggerganov/llama.cpp:server
291
+
ports:
292
+
- 8080:8080
293
+
volumes:
294
+
- ./models:/models
295
+
environment:
296
+
# alternatively, you can use "LLAMA_ARG_MODEL_URL" to download the model
297
+
LLAMA_ARG_MODEL: /models/my_model.gguf
298
+
LLAMA_ARG_CTX_SIZE: 4096
299
+
LLAMA_ARG_N_PARALLEL: 2
300
+
LLAMA_ARG_ENDPOINT_METRICS: 1# to disable, either remove or set to 0
0 commit comments