Skip to content

Commit 0704ed3

Browse files
pytorchbotlucylq
andauthored
[executorch][docs] Update stories cmd to use kv cache (#5466)
Update stories cmd to use kv cache (#5460) Summary: Pull Request resolved: #5460 Reviewed By: dvorjackz Differential Revision: D62925331 Pulled By: lucylq fbshipit-source-id: a5c977055fe208cd8f1db20f147247a5a0f6fdbf (cherry picked from commit d2a38cc) Co-authored-by: lucylq <[email protected]>
1 parent 30f59d2 commit 0704ed3

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/models/llama2/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Note that since Llama3's vocabulary size is 4x that of Llama2, we had to quantiz
6060
|OnePlus 12 | 10.85 tokens/second | 11.02 tokens/second |
6161

6262
### Llama3.1
63-
> :warning: **use the main branch**: Llama3.1 is supported on the ExecuTorch main branch (not release 0.3).
63+
Llama3.1 is supported on the ExecuTorch main branch and release/0.4
6464

6565
# Instructions
6666

@@ -111,7 +111,7 @@ If you want to deploy and run a smaller model for educational purposes. From `ex
111111
```
112112
3. Export model and generate `.pte` file.
113113
```
114-
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X
114+
python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv
115115
```
116116
4. Create tokenizer.bin.
117117

0 commit comments

Comments
 (0)