| 
3 | 3 | The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.  | 
4 | 4 | 
 
  | 
5 | 5 | ```bash  | 
6 |  | -./llama-run Meta-Llama-3.1-8B-Instruct.gguf  | 
 | 6 | +llama-run granite-code  | 
7 | 7 | ...  | 
 | 8 | + | 
 | 9 | +```bash  | 
 | 10 | +llama-run -h  | 
 | 11 | +Description:  | 
 | 12 | +  Runs a llm  | 
 | 13 | +
  | 
 | 14 | +Usage:  | 
 | 15 | +  llama-run [options] model [prompt]  | 
 | 16 | +
  | 
 | 17 | +Options:  | 
 | 18 | +  -c, --context-size <value>  | 
 | 19 | +      Context size (default: 2048)  | 
 | 20 | +  -n, --ngl <value>  | 
 | 21 | +      Number of GPU layers (default: 0)  | 
 | 22 | +  -h, --help  | 
 | 23 | +      Show help message  | 
 | 24 | +
  | 
 | 25 | +Commands:  | 
 | 26 | +  model  | 
 | 27 | +      Model is a string with an optional prefix of   | 
 | 28 | +      huggingface:// (hf://), ollama://, https:// or file://.  | 
 | 29 | +      If no protocol is specified and a file exists in the specified  | 
 | 30 | +      path, file:// is assumed, otherwise if a file does not exist in  | 
 | 31 | +      the specified path, ollama:// is assumed. Models that are being  | 
 | 32 | +      pulled are downloaded with .partial extension while being  | 
 | 33 | +      downloaded and then renamed as the file without the .partial  | 
 | 34 | +      extension when complete.  | 
 | 35 | +
  | 
 | 36 | +Examples:  | 
 | 37 | +  llama-run llama3  | 
 | 38 | +  llama-run ollama://granite-code  | 
 | 39 | +  llama-run ollama://smollm:135m  | 
 | 40 | +  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf  | 
 | 41 | +  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf  | 
 | 42 | +  llama-run https://example.com/some-file1.gguf  | 
 | 43 | +  llama-run some-file2.gguf  | 
 | 44 | +  llama-run file://some-file3.gguf  | 
 | 45 | +  llama-run --ngl 99 some-file4.gguf  | 
 | 46 | +  llama-run --ngl 99 some-file5.gguf Hello World  | 
 | 47 | +...  | 
 | 48 | +
  | 
0 commit comments