Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit 1015de7

Browse files
authored
Update quantization.md
remove -l 3 from aoti run , and write -l3 for et_run
1 parent bd594fb commit 1015de7

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/quantization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ OMP_NUM_THREADS=6 python3 torchchat.py generate llama3.1 --dso-path llama3_1.so
182182
If you built the AOTI runner with link_torchao_ops as discussed in the setup section, you can also use the C++ runner:
183183

184184
```
185-
OMP_NUM_THREADS=6 ./cmake-out/aoti_run llama3_1.so -z $HOME/.torchchat/model-cache/meta-llama/Meta-Llama-3.1-8B-Instruct/tokenizer.model -l 3 -i "Once upon a time,"
185+
OMP_NUM_THREADS=6 ./cmake-out/aoti_run llama3_1.so -z $HOME/.torchchat/model-cache/meta-llama/Meta-Llama-3.1-8B-Instruct/tokenizer.model -i "Once upon a time," # -l 3
186186
```
187187

188188
#### ExecuTorch
@@ -193,7 +193,7 @@ python torchchat.py export llama3.1 --device cpu --dtype float32 --quantize '{"e
193193
Note: only the ExecuTorch C++ runner in torchchat when built using the instructions in the setup can run the exported *.pte file. It will not work with the `python torchchat.py generate` command.
194194

195195
```
196-
./cmake-out/et_run llama3_1.pte -z $HOME/.torchchat/model-cache/meta-llama/Meta-Llama-3.1-8B-Instruct/tokenizer.model -l 3 -i "Once upon a time,"
196+
./cmake-out/et_run llama3_1.pte -z $HOME/.torchchat/model-cache/meta-llama/Meta-Llama-3.1-8B-Instruct/tokenizer.model -l3 -i "Once upon a time,"
197197
```
198198

199199
## Experimental TorchAO MPS lowbit kernels

0 commit comments

Comments
 (0)