|
| 1 | +# Use Ollama with any GGUF Model on Hugging Face Hub |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +Ollama is an application based on llama.cpp to interact with LLMs directly through your computer. You can use any GGUF quants created by the community ([bartowski](https://huggingface.co/bartowski), [MaziyarPanahi](https://huggingface.co/MaziyarPanahi) and many more) on Hugging Face directly with Ollama, without creating a new `Modelfile`. At the time of writing there are 45K public GGUF checkpoints on the Hub, you can run any of them with a single `ollama run` command. We also provide customisations like choosing quantization type, system prompt and more to improve your overall experience. |
| 6 | + |
| 7 | +Getting started is as simple as: |
| 8 | + |
| 9 | +```sh |
| 10 | +ollama run hf.co/{username}/{repository} |
| 11 | +``` |
| 12 | + |
| 13 | +Please note that you can use both `hf.co` and `huggingface.co` as the domain name. |
| 14 | + |
| 15 | +Here are some other models that you can try: |
| 16 | + |
| 17 | +```sh |
| 18 | +ollama run hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF |
| 19 | +ollama run hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF |
| 20 | +ollama run hf.co/arcee-ai/SuperNova-Medius-GGUF |
| 21 | +ollama run hf.co/bartowski/Humanish-LLama3-8B-Instruct-GGUF |
| 22 | +``` |
| 23 | + |
| 24 | +## Custom Quantization |
| 25 | + |
| 26 | +By default, the `Q4_K_M` quantization scheme is used. To select a different scheme, simply add a tag: |
| 27 | + |
| 28 | +```sh |
| 29 | +ollama run hf.co/{username}/{repository}:{quantization} |
| 30 | +``` |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +For example: |
| 35 | + |
| 36 | +```sh |
| 37 | +ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:IQ3_M |
| 38 | +ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0 |
| 39 | + |
| 40 | +# the quantization name is case-insensitive, this will also work |
| 41 | +ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:iq3_m |
| 42 | + |
| 43 | +# you can also select a specific file |
| 44 | +ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:Llama-3.2-3B-Instruct-IQ3_M.gguf |
| 45 | +``` |
| 46 | + |
| 47 | +## Custom Chat Template and Parameters |
| 48 | + |
| 49 | +By default, a template will be selected automatically from a list of commonly used templates. It will be selected based on the built-in `tokenizer.chat_template` metadata stored inside the GGUF file. |
| 50 | + |
| 51 | +If your GGUF file doesn't have a built-in template or uses a custom chat template, you can create a new file called `template` in the repository. The template must be a Go template, not a Jinja template. Here's an example: |
| 52 | + |
| 53 | +``` |
| 54 | +{{ if .System }}<|system|> |
| 55 | +{{ .System }}<|end|> |
| 56 | +{{ end }}{{ if .Prompt }}<|user|> |
| 57 | +{{ .Prompt }}<|end|> |
| 58 | +{{ end }}<|assistant|> |
| 59 | +{{ .Response }}<|end|> |
| 60 | +``` |
| 61 | + |
| 62 | +To know more about Go template format, please refer to [this documentation](https://github.com/ollama/ollama/blob/main/docs/template.md) |
| 63 | + |
| 64 | +You can optionally configure a system prompt by putting it into a new file named `system` in the repository. |
| 65 | + |
| 66 | +To change sampling parameters, create a file named `params` in the repository. The file must be in JSON format. For the list of all available parameters, please refer to [this documentation](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter). |
| 67 | + |
| 68 | + |
| 69 | +## References |
| 70 | + |
| 71 | +- https://github.com/ollama/ollama/blob/main/docs/README.md |
| 72 | +- https://huggingface.co/docs/hub/en/gguf |
0 commit comments