|
3 | 3 | The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.
|
4 | 4 |
|
5 | 5 | ```bash
|
6 |
| -./llama-run Meta-Llama-3.1-8B-Instruct.gguf |
| 6 | +llama-run granite-code |
| 7 | +... |
| 8 | + |
| 9 | +```bash |
| 10 | +llama-run -h |
| 11 | +Description: |
| 12 | + Runs a llm |
| 13 | +
|
| 14 | +Usage: |
| 15 | + llama-run [options] model [prompt] |
| 16 | +
|
| 17 | +Options: |
| 18 | + -c, --context-size <value> |
| 19 | + Context size (default: 2048) |
| 20 | + -n, --ngl <value> |
| 21 | + Number of GPU layers (default: 0) |
| 22 | + -h, --help |
| 23 | + Show help message |
| 24 | +
|
| 25 | +Commands: |
| 26 | + model |
| 27 | + Model is a string with an optional prefix of |
| 28 | + huggingface:// (hf://), ollama://, https:// or file://. |
| 29 | + If no protocol is specified and a file exists in the specified |
| 30 | + path, file:// is assumed, otherwise if a file does not exist in |
| 31 | + the specified path, ollama:// is assumed. Models that are being |
| 32 | + pulled are downloaded with .partial extension while being |
| 33 | + downloaded and then renamed as the file without the .partial |
| 34 | + extension when complete. |
| 35 | +
|
| 36 | +Examples: |
| 37 | + llama-run llama3 |
| 38 | + llama-run ollama://granite-code |
| 39 | + llama-run ollama://smollm:135m |
| 40 | + llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf |
| 41 | + llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf |
| 42 | + llama-run https://example.com/some-file1.gguf |
| 43 | + llama-run some-file2.gguf |
| 44 | + llama-run file://some-file3.gguf |
| 45 | + llama-run --ngl 99 some-file4.gguf |
| 46 | + llama-run --ngl 99 some-file5.gguf Hello World |
7 | 47 | ...
|
0 commit comments