You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 10, 2025. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+12-10Lines changed: 12 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,16 +45,16 @@ aliases.
45
45
46
46
| Model | Mobile Friendly | Notes |
47
47
|------------------|---|---------------------|
48
-
|[meta-llama/Meta-Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.2-3b`.|
48
+
|[meta-llama/Meta-Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.2-3b`.|
49
49
|[meta-llama/Meta-Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)|✅|Best for `generate`. Alias to `llama3.2-3b-base`.|
50
-
|[meta-llama/Llama-Guard-3-1B](https://huggingface.co/meta-llama/Llama-Guard-3-1B)|✅|Tuned for classification. Alias to `llama3-1b-guard`.|
51
-
|[meta-llama/Meta-Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.2-1b`.|
50
+
|[meta-llama/Llama-Guard-3-1B](https://huggingface.co/meta-llama/Llama-Guard-3-1B)|✅|Tuned for classification. Alias to `llama3-1b-guard`.|
51
+
|[meta-llama/Meta-Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.2-1b`.|
52
52
|[meta-llama/Meta-Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)|✅|Best for `generate`. Alias to `llama3.2-1b-base`.|
53
-
|[meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)||Multimodal (Image + Text). Tuned for `chat`. Alias to `llama3.2-11B`.|
54
-
|[meta-llama/Llama-3.2-11B-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)||Multimodal (Image + Text). Tuned for `generate`. Alias to `llama3.2-11B-base`.|
55
-
|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.1`.|
53
+
|[meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)||Multimodal (Image + Text). Tuned for `chat`. Alias to `llama3.2-11B`.|
54
+
|[meta-llama/Llama-3.2-11B-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)||Multimodal (Image + Text). Tuned for `generate`. Alias to `llama3.2-11B-base`.|
55
+
|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.1`.|
56
56
|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)|✅|Best for `generate`. Alias to `llama3.1-base`.|
57
-
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|✅|Tuned for `chat`. Alias to `llama3`.|
57
+
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|✅|Tuned for `chat`. Alias to `llama3`.|
58
58
|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)|✅|Best for `generate`. Alias to `llama3-base`.|
59
59
|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|✅|Tuned for `chat`. Alias to `llama2`.|
60
60
|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)||Tuned for `chat`. Alias to `llama2-13b-chat`.|
@@ -231,6 +231,8 @@ python3 torchchat.py server llama3.1
231
231
```
232
232
[skip default]: end
233
233
234
+
[shell default]: python3 torchchat.py server llama3.1 & server_pid=$!
235
+
234
236
In another terminal, query the server using `curl`. Depending on the model configuration, this query might take a few minutes to respond.
235
237
236
238
> [!NOTE]
@@ -244,8 +246,6 @@ Setting `stream` to "true" in the request emits a response in chunks. If `stream
{"response":" I'm a software developer with a passion for building innovative and user-friendly applications. I have experience in developing web and mobile applications using various technologies such as Java, Python, and JavaScript. I'm always looking for new challenges and opportunities to learn and grow as a developer.\n\nIn my free time, I enjoy reading books on computer science and programming, as well as experimenting with new technologies and techniques. I'm also interested in machine learning and artificial intelligence, and I'm always looking for ways to apply these concepts to real-world problems.\n\nI'm excited to be a part of the developer community and to have the opportunity to share my knowledge and experience with others. I'm always happy to help with any questions or problems you may have, and I'm looking forward to learning from you as well.\n\nThank you for visiting my profile! I hope you find my information helpful and interesting. If you have any questions or would like to discuss any topics, please feel free to reach out to me. I"}
270
271
```
271
272
272
273
[skip default]: end
273
274
275
+
[shell default]: kill ${server_pid}
274
276
275
277
</details>
276
278
@@ -664,6 +666,6 @@ awesome libraries and tools you've built around local LLM inference.
664
666
665
667
torchchat is released under the [BSD 3 license](LICENSE). (Additional
666
668
code in this distribution is covered by the MIT and Apache Open Source
667
-
licenses.) However you may have other legal obligations that govern
669
+
licenses.) However, you may have other legal obligations that govern
668
670
your use of content, such as the terms of service for third-party
Copy file name to clipboardExpand all lines: docs/ADVANCED-USERS.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,20 @@
1
1
> [!WARNING]
2
2
> Files in this directory may be outdated, incomplete, scratch notes, or a WIP. torchchat provides no guarantees on these files as references. Please refer to the root README for stable features and documentation.
3
3
4
-
# Torchchat is still in pre-release!
5
-
6
-
7
-
Torchchat is currently in a pre-release state and under extensive development.
This is the advanced users' guide, if you're looking to get started
22
20
with LLMs, please refer to the README at the root directory of the
@@ -251,6 +249,8 @@ To improve performance, you can compile the model with `--compile`
251
249
trading off the time to first token processed with time per token. To
252
250
improve performance further, you may also compile the prefill with
253
251
`--compile-prefill`. This will increase further compilation times though.
252
+
For CPU, you can use `--max-autotune` to further improve the performance
253
+
with `--compile` and `compile-prefill`. See [`max-autotune on CPU tutorial`](https://pytorch.org/tutorials/prototype/max_autotune_on_CPU_tutorial.html).
254
254
255
255
Parallel prefill is not yet supported by exported models, and may be
256
256
supported in a future release.
@@ -463,7 +463,7 @@ significant impact on accuracy.
463
463
464
464
## Native (Stand-Alone) Execution of Exported Models
465
465
466
-
Refer to the [README](README.md] for an introduction to native
466
+
Refer to the [README](README.md) for an introduction to native
467
467
execution on servers, desktops, and laptops. Mobile and Edge execution for Android and iOS are
468
468
described under [torchchat/edge/docs/Android.md] and [torchchat/edge/docs/iOS.md], respectively.
Torchchat provides evaluation functionality for your language model on
11
16
a variety of tasks using the
@@ -14,26 +19,53 @@ library.
14
19
15
20
## Usage
16
21
17
-
The evaluation mode of `torchchat.py` script can be used to evaluate your language model on various tasks available in the `lm_eval` library such as "wikitext". You can specify the task(s) you want to evaluate using the `--tasks` option, and limit the evaluation using the `--limit` option. If no task is specified, it will default to evaluating on "wikitext".
22
+
The evaluation mode of `torchchat.py` script can be used to evaluate your language model on various tasks available in the `lm_eval` library such as "wikitext". You can specify the task(s) you want to evaluate using the `--tasks` option, and limit the evaluation using the `--limit` option. If no task is specified, the task will default to evaluating on "wikitext".
0 commit comments