|
| 1 | +# Using Local Models in Torchcha/ |
| 2 | +Torchchat provides powerful capabilities for running large language models (LLMs) locally. This guide focuses on utilizing local copies of |
| 3 | +model checkpoints or models in GGUF format to create a chat application. It also highlights relevant options for advanced users. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | +To work with local models, you need: |
| 7 | +1. **Model Weights**: A checkpoint file (e.g., `.pth`, `.pt`) or a GGUF file (e.g., `.gguf`). |
| 8 | +2. **Tokenizer**: A tokenizer model file.This can either be in SentencePiece or TikToken format, depending on the tokenizer used with the model. |
| 9 | +3. **Parameter File**: (a) A custom parameter file in JSON format, or (b) a pre-existing parameter file with `--params-path` |
| 10 | + or `--params-table`, or (c) a pathname that’s matched against known models by longest substring in configuration name, using the same algorithm as GPT-fast. |
| 11 | + |
| 12 | +Ensure the tokenizer and parameter files are in the same directory as the checkpoint or GGUF file for automatic detection. |
| 13 | +Let’s use a local download of the stories15M tinyllama model as an example: |
| 14 | + |
| 15 | +``` |
| 16 | +mkdir stories15M |
| 17 | +cd stories15M |
| 18 | +wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.pt |
| 19 | +wget https://github.com/karpathy/llama2.c/raw/refs/heads/master/tokenizer.model |
| 20 | +cp ../torchchat/model_params/stories15M.json model.json |
| 21 | +cd .. |
| 22 | +``` |
| 23 | + |
| 24 | + |
| 25 | +## Using Local Checkpoints |
| 26 | +Torchchat provides the CLI flag `--checkpoint-path` for specifying local model weights. The tokenizer is |
| 27 | +loaded from the same directory as the checkpoint with the name ‘tokenizer.model’ unless separately specified. |
| 28 | +This example obtains the model parameters by name matching to known models because ‘stories15M’ is one of the |
| 29 | +models known to torchchat with a configuration stories in ‘torchchat/model_params’: |
| 30 | + |
| 31 | + |
| 32 | +### Example 1: Basic Text Generation |
| 33 | + |
| 34 | + |
| 35 | +``` |
| 36 | +python3 torchchat.py generate \ |
| 37 | + --checkpoint-path stories15M/stories15M.pt \ |
| 38 | + --prompt "Hello, my name is" |
| 39 | +``` |
| 40 | + |
| 41 | + |
| 42 | +### Example 2: Providing Additional Artifacts |
| 43 | +The following is an example of how to specify a local model checkpoint, the model architecture, and a tokenizer file: |
| 44 | +``` |
| 45 | +python3 torchchat.py generate \ |
| 46 | + --prompt "Once upon a time" \ |
| 47 | + --checkpoint-path stories15M/stories15M.pt \ |
| 48 | + --params-path stories15M/model.json \ |
| 49 | + --tokenizer-path stories15M/tokenizer.model |
| 50 | +``` |
| 51 | + |
| 52 | + |
| 53 | +Alternatively, we can specify the known architecture configuration for known models using ‘--params-table’ |
| 54 | +to specify a p[particular architecture in the ‘torchchat/model_params’: |
| 55 | + |
| 56 | +``` |
| 57 | +python3 torchchat.py generate \ |
| 58 | + --prompt "Once upon a time" \ |
| 59 | + --checkpoint-path stories15M/stories15M.pt \ |
| 60 | + --params-table stories15M \ |
| 61 | + --tokenizer-path stories15M//tokenizer.model |
| 62 | +``` |
| 63 | + |
| 64 | + |
| 65 | +## Using GGUF Models |
| 66 | +Torchchat supports loading models in GGUF format using the `--gguf-file`. Refer to GGUF.md for additional |
| 67 | +documentation about using GGUF files in torchchat. |
| 68 | + |
| 69 | +The GGUF format is compatible with several quantization levels such as F16, F32, Q4_0, and Q6_K. Model |
| 70 | +configuration information is obtained directly from the GGUF file, simplifying setup and obviating the |
| 71 | +need for a separate `model.json` model architecture specification. |
| 72 | + |
| 73 | + |
| 74 | +## Using local models |
| 75 | +Torchchat supports all commands such as chat, browser, server and export using local models. (In fact, |
| 76 | +known models simply download and populate the parameters specified for local models.) |
| 77 | +Here is an example setup for running a server with a local model: |
| 78 | + |
| 79 | + |
| 80 | +[skip default]: begin |
| 81 | +``` |
| 82 | +python3 torchchat.py server --checkpoint-path stories15M/stories15M.pt |
| 83 | +``` |
| 84 | +[skip default]: end |
| 85 | + |
| 86 | + |
| 87 | +[shell default]: python3 torchchat.py server --checkpoint-path stories15M/stories15M.pt & server_pid=$! ; sleep 90 # wait for server to be ready to accept requests |
| 88 | + |
| 89 | + |
| 90 | +In another terminal, query the server using `curl`. Depending on the model configuration, this query might take a few minutes to respond. |
| 91 | + |
| 92 | + |
| 93 | +> [!NOTE] |
| 94 | +> Since this feature is under active development, not every parameter is consumed. See `#api/api.pyi` for details on |
| 95 | +> which request parameters are implemented. If you encounter any issues, please comment on the [tracking Github issue](https://github.com/pytorch/torchchat/issues/973). |
| 96 | +
|
| 97 | + |
| 98 | +<details> |
| 99 | + |
| 100 | + |
| 101 | +<summary>Example Query</summary> |
| 102 | +Setting `stream` to "true" in the request emits a response in chunks. If `stream` is unset or not "true", then the client will |
| 103 | +await the full response from the server. |
| 104 | + |
| 105 | + |
| 106 | +**Example: using the server** |
| 107 | +A model server used witha local model works like any other torchchat server. You can test it by sending a request with ‘curl’: |
| 108 | +``` |
| 109 | +curl http://127.0.0.1:5000/v1/chat/completions \ |
| 110 | + -H "Content-Type: application/json" \ |
| 111 | + -d '{ |
| 112 | + "model": "llama3.1", |
| 113 | + "stream": "true", |
| 114 | + "max_tokens": 200, |
| 115 | + "messages": [ |
| 116 | + { |
| 117 | + "role": "system", |
| 118 | + "content": "You are a helpful assistant." |
| 119 | + }, |
| 120 | + { |
| 121 | + "role": "user", |
| 122 | + "content": "Hello!" |
| 123 | + } |
| 124 | + ] |
| 125 | + }' |
| 126 | +``` |
| 127 | + |
| 128 | + |
| 129 | +[shell default]: kill ${server_pid} |
| 130 | + |
| 131 | + |
| 132 | +</details> |
| 133 | + |
| 134 | + |
| 135 | +For more information about using different commands, see the root README.md and refer to the Advanced Users Guide for further details on advanced configurations and parameter tuning. |
| 136 | + |
| 137 | + |
| 138 | +[end default]: end |
0 commit comments