Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit 5e3c3ce

Browse files
authored
Merge branch 'main' into pin_torch
2 parents bfaad20 + 24d00ea commit 5e3c3ce

File tree

15 files changed

+170
-207
lines changed

15 files changed

+170
-207
lines changed

.github/workflows/pull.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1103,7 +1103,7 @@ jobs:
11031103
with:
11041104
path: |
11051105
./et-build
1106-
./torchchat/utils/scripts
1106+
./torchchat/utils/scripts/install_et.sh
11071107
key: et-build-${{runner.os}}-${{runner.arch}}-${{env.et-git-hash}}-${{ hashFiles('**/install_et.sh') }}
11081108
- if: ${{ steps.install-et.outputs.cache-hit != 'true' }}
11091109
continue-on-error: true

README.md

Lines changed: 39 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ torchchat is a small codebase showcasing the ability to run large language model
2525

2626
## Highlights
2727

28+
- [[New!!] Multimodal Support for Llama 3.2 11B](docs/multimodal.md)
2829
- Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more
2930
- PyTorch-native execution with performance
3031
- Supports popular hardware and OS
@@ -37,6 +38,38 @@ torchchat is a small codebase showcasing the ability to run large language model
3738
- Multiple execution modes including: Python (Eager, Compile) or Native (AOT Inductor (AOTI), ExecuTorch)
3839

3940

41+
## Models
42+
43+
The following models are supported by torchchat and have associated
44+
aliases.
45+
46+
| Model | Mobile Friendly | Notes |
47+
|------------------|---|---------------------|
48+
|[meta-llama/Meta-Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)||Tuned for `chat` . Alias to `llama3.2-3b`.|
49+
|[meta-llama/Meta-Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)||Best for `generate`. Alias to `llama3.2-3b-base`.|
50+
|[meta-llama/Llama-Guard-3-1B](https://huggingface.co/meta-llama/Llama-Guard-3-1B)||Tuned for classification . Alias to `llama3-1b-guard`.|
51+
|[meta-llama/Meta-Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)||Tuned for `chat` . Alias to `llama3.2-1b`.|
52+
|[meta-llama/Meta-Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)||Best for `generate`. Alias to `llama3.2-1b-base`.|
53+
|[meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)||Multimodal (Image + Text). Tuned for `chat` . Alias to `llama3.2-11B`.|
54+
|[meta-llama/Llama-3.2-11B-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)||Multimodal (Image + Text). Tuned for `generate` . Alias to `llama3.2-11B-base`.|
55+
|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)||Tuned for `chat` . Alias to `llama3.1`.|
56+
|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)||Best for `generate`. Alias to `llama3.1-base`.|
57+
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)||Tuned for `chat` . Alias to `llama3`.|
58+
|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)||Best for `generate`. Alias to `llama3-base`.|
59+
|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)||Tuned for `chat`. Alias to `llama2`.|
60+
|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)||Tuned for `chat`. Alias to `llama2-13b-chat`.|
61+
|[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)||Tuned for `chat`. Alias to `llama2-70b-chat`.|
62+
|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)||Best for `generate`. Alias to `llama2-base`.|
63+
|[meta-llama/CodeLlama-7b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-7b-Python-hf)||Tuned for Python and `generate`. Alias to `codellama`.|
64+
|[meta-llama/CodeLlama-34b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Python-hf)||Tuned for Python and `generate`. Alias to `codellama-34b`.|
65+
|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)||Best for `generate`. Alias to `mistral-7b-v01-base`.|
66+
|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)||Tuned for `chat`. Alias to `mistral-7b-v01-instruct`.|
67+
|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)||Tuned for `chat`. Alias to `mistral`.|
68+
|[tinyllamas/stories15M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories15M`.|
69+
|[tinyllamas/stories42M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories42M`.|
70+
|[tinyllamas/stories110M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories110M`.|
71+
|[openlm-research/open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b)||Best for `generate`. Alias to `open-llama`.|
72+
4073
## Installation
4174
The following steps require that you have [Python 3.10](https://www.python.org/downloads/release/python-3100/) installed.
4275

@@ -105,7 +138,6 @@ __Evaluation__ (eval)
105138
* This command test model fidelity via EleutherAI's [lm_evaluation_harness](https://github.com/EleutherAI/lm-evaluation-harness).
106139
* More information is provided in the [Evaluation](https://github.com/pytorch/torchchat?tab=readme-ov-file#eval) section.
107140

108-
109141
## Download Weights
110142
Most models use Hugging Face as the distribution channel, so you will need to create a Hugging Face account.
111143
Create a Hugging Face user access token [as documented here](https://huggingface.co/docs/hub/en/security-tokens) with the `write` role.
@@ -118,9 +150,13 @@ Log into Hugging Face:
118150
huggingface-cli login
119151
```
120152

121-
Once this is done, torchchat will be able to download model artifacts from
122-
Hugging Face.
153+
Take a look at the available models:
123154

155+
```bash
156+
python3 torchchat.py list
157+
```
158+
159+
Then download one for testing (this README uses llama3.1)
124160
```
125161
python3 torchchat.py download llama3.1
126162
```
@@ -134,12 +170,6 @@ python3 torchchat.py download llama3.1
134170
<details>
135171
<summary>Additional Model Inventory Management Commands</summary>
136172

137-
### List
138-
This subcommand shows the available models
139-
```bash
140-
python3 torchchat.py list
141-
```
142-
143173
### Where
144174
This subcommand shows location of a particular model.
145175
```bash
@@ -511,44 +541,6 @@ the same way you would to generate:
511541
python3 torchchat.py eval llama3.1 --pte-path llama3.1.pte --limit 5
512542
```
513543

514-
515-
516-
## Models
517-
518-
The following models are supported by torchchat and have associated
519-
aliases.
520-
521-
| Model | Mobile Friendly | Notes |
522-
|------------------|---|---------------------|
523-
|[meta-llama/Meta-Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)||Tuned for `chat` . Alias to `llama3.2-3b`.|
524-
|[meta-llama/Meta-Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)||Best for `generate`. Alias to `llama3.2-3b-base`.|
525-
|[meta-llama/Llama-Guard-3-1B](https://huggingface.co/meta-llama/Llama-Guard-3-1B)||Tuned for classification . Alias to `llama3-1b-guard`.|
526-
|[meta-llama/Meta-Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)||Tuned for `chat` . Alias to `llama3.2-1b`.|
527-
|[meta-llama/Meta-Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)||Best for `generate`. Alias to `llama3.2-1b-base`.|
528-
|[meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)||Multimodal (Image + Text). Tuned for `chat` . Alias to `llama3.2-11B`.|
529-
|[meta-llama/Llama-3.2-11B-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)||Multimodal (Image + Text). Tuned for `generate` . Alias to `llama3.2-11B-base`.|
530-
|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)||Tuned for `chat` . Alias to `llama3.1`.|
531-
|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)||Best for `generate`. Alias to `llama3.1-base`.|
532-
|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)||Tuned for `chat` . Alias to `llama3`.|
533-
|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)||Best for `generate`. Alias to `llama3-base`.|
534-
|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)||Tuned for `chat`. Alias to `llama2`.|
535-
|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)||Tuned for `chat`. Alias to `llama2-13b-chat`.|
536-
|[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)||Tuned for `chat`. Alias to `llama2-70b-chat`.|
537-
|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)||Best for `generate`. Alias to `llama2-base`.|
538-
|[meta-llama/CodeLlama-7b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-7b-Python-hf)||Tuned for Python and `generate`. Alias to `codellama`.|
539-
|[meta-llama/CodeLlama-34b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Python-hf)||Tuned for Python and `generate`. Alias to `codellama-34b`.|
540-
|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)||Best for `generate`. Alias to `mistral-7b-v01-base`.|
541-
|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)||Tuned for `chat`. Alias to `mistral-7b-v01-instruct`.|
542-
|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)||Tuned for `chat`. Alias to `mistral`.|
543-
|[tinyllamas/stories15M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories15M`.|
544-
|[tinyllamas/stories42M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories42M`.|
545-
|[tinyllamas/stories110M](https://huggingface.co/karpathy/tinyllamas/tree/main)||Toy model for `generate`. Alias to `stories110M`.|
546-
|[openlm-research/open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b)||Best for `generate`. Alias to `open-llama`.|
547-
548-
While we describe how to use torchchat using the popular llama3 model,
549-
you can perform the example commands with any of these models.
550-
551-
552544
## Design Principles
553545

554546
torchchat embodies PyTorch’s design philosophy [details](https://pytorch.org/docs/stable/community/design.html), especially "usability over everything else".

docs/multimodal.md

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Released on September 25th, 2024, **Llama3.2 11B Vision** is torchchat's first multimodal model.
44

5-
This page goes over the different commands you can run with LLama 3.2 11B Vision.
5+
This page goes over the different commands you can run with LLama 3.2 11B Vision.
66

77
## Model Access
88

@@ -44,7 +44,42 @@ python3 torchchat.py server llama3.2-11B
4444

4545
In another terminal, query the server using `curl`. This query might take a few minutes to respond.
4646

47-
**We are currently debugging the server integration and will have updated examples shortly.**
47+
<details>
48+
<summary>Example Query</summary>
49+
50+
Setting `stream` to "true" in the request emits a response in chunks. If `stream` is unset or not "true", then the client will await the full response from the server.
51+
52+
**Example Input + Output**
53+
54+
```
55+
curl http://127.0.0.1:5000/v1/chat/completions \
56+
-H "Content-Type: application/json" \
57+
-d '{
58+
"model": "llama3.2",
59+
"messages": [
60+
{
61+
"role": "user",
62+
"content": [
63+
{
64+
"type": "text",
65+
"text": "What'\''s in this image?"
66+
},
67+
{
68+
"type": "image_url",
69+
"image_url": ""
70+
}
71+
]
72+
}
73+
],
74+
"max_tokens": 300
75+
}'
76+
```
77+
78+
```
79+
{"id": "chatcmpl-cb7b39af-a22e-4f71-94a8-17753fa0d00c", "choices": [{"message": {"role": "assistant", "content": "The image depicts a simple black and white cartoon-style drawing of an animal face. It features a profile view, complete with two ears, expressive eyes, and a partial snout. The animal looks to the left, with its eye and mouth implied, suggesting that the drawn face might belong to a rabbit, dog, or pig. The graphic face has a bold black outline and a smaller, solid black nose. A small circle, forming part of the face, has a white background with two black quirkly short and long curved lines forming an outline of what was likely a mouth, complete with two teeth. The presence of the curve lines give the impression that the animal is smiling or speaking. Grey and black shadows behind the right ear and mouth suggest that this face is looking left and upwards. Given the prominent outline of the head and the outline of the nose, it appears that the depicted face is most likely from the side profile of a pig, although the ears make it seem like a dog and the shape of the nose makes it seem like a rabbit. Overall, it seems that this image, possibly part of a character illustration, is conveying a playful or expressive mood through its design and positioning."}, "finish_reason": "stop"}], "created": 1727487574, "model": "llama3.2", "system_fingerprint": "cpu_torch.float16", "object": "chat.completion"}%
80+
```
81+
82+
</details>
4883

4984
## Browser
5085

@@ -58,8 +93,6 @@ First, follow the steps in the Server section above to start a local server. The
5893
streamlit run torchchat/usages/browser.py
5994
```
6095

61-
**We are currently debugging the browser integration and will have updated examples shortly.**
62-
6396
---
6497

6598
# Future Work

docs/quantization.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -122,11 +122,11 @@ python3 torchchat.py generate llama3 --pte-path llama3.pte --prompt "Hello my n
122122

123123
### Use
124124
The quantization scheme a8wxdq dynamically quantizes activations to 8 bits, and quantizes the weights in a groupwise manner with a specified bitwidth and groupsize.
125-
It takes arguments bitwidth (2, 3, 4, 5, 6, 7), groupsize, and has_weight_zeros (true, false).
125+
It takes arguments bitwidth (1, 2, 3, 4, 5, 6, 7), groupsize, and has_weight_zeros (true, false).
126126
The argument has_weight_zeros indicates whether the weights are quantized with scales only (has_weight_zeros: false) or with both scales and zeros (has_weight_zeros: true).
127-
Roughly speaking, {bitwidth: 4, groupsize: 256, has_weight_zeros: false} is similar to GGML's Q40 quantization scheme.
127+
Roughly speaking, {bitwidth: 4, groupsize: 256, has_weight_zeros: false} is similar to GGML's Q4_0 quantization scheme.
128128

129-
You should expect high performance on ARM CPU if bitwidth is 2, 3, 4, or 5 and groupsize is divisible by 16. With other platforms and argument choices, a slow fallback kernel will be used. You will see warnings about this during quantization.
129+
You should expect high performance on ARM CPU if bitwidth is 1, 2, 3, 4, or 5 and groupsize is divisible by 16. With other platforms and argument choices, a slow fallback kernel will be used. You will see warnings about this during quantization.
130130

131131
### Setup
132132
To use a8wxdq, you must set up the torchao experimental kernels. These will only work on devices with ARM CPUs, for example on Mac computers with Apple Silicon.
@@ -138,7 +138,7 @@ sh torchchat/utils/scripts/build_torchao_ops.sh
138138

139139
This should take about 10 seconds to complete. Once finished, you can use a8wxdq in torchchat.
140140

141-
Note: if you want to use the new kernels in the AOTI and C++ runners, you must pass the flag link_torchao when running the scripts the build the runners.
141+
Note: if you want to use the new kernels in the AOTI and C++ runners, you must pass the flag link_torchao_ops when running the scripts the build the runners.
142142

143143
```
144144
sh torchchat/utils/scripts/build_native.sh aoti link_torchao_ops

install/.pins/torchao-pin.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
63cb7a9857654784f726fec75c0dc36167094d8a
1+
ae3e7c68eae7085e13241cb3d6b39481868dd162

0 commit comments

Comments
 (0)