Skip to content

Commit d2a38cc

Browse files
lucylqfacebook-github-bot
authored andcommitted
Update stories cmd to use kv cache (#5460)
Summary: Pull Request resolved: #5460 Reviewed By: dvorjackz Differential Revision: D62925331 Pulled By: lucylq fbshipit-source-id: a5c977055fe208cd8f1db20f147247a5a0f6fdbf
1 parent 0648a8a commit d2a38cc

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/models/llama2/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ Note that since Llama3's vocabulary size is 4x that of Llama2, we had to quantiz
6666
|OnePlus 12 | 10.85 tokens/second | 11.02 tokens/second |
6767

6868
### Llama3.1
69-
> :warning: **use the main branch**: Llama3.1 is supported on the ExecuTorch main branch (not release 0.3).
69+
Llama3.1 is supported on the ExecuTorch main branch and release/0.4
7070

7171
# Instructions
7272

@@ -117,7 +117,7 @@ If you want to deploy and run a smaller model for educational purposes. From `ex
117117
```
118118
3. Export model and generate `.pte` file.
119119
```
120-
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X
120+
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
121121
```
122122
4. Create tokenizer.bin.
123123

0 commit comments

Comments
 (0)