Skip to content

Commit dc889b9

Browse files
committed
Lunwen pr comments
1 parent 1431167 commit dc889b9

File tree

2 files changed

+13
-13
lines changed

2 files changed

+13
-13
lines changed

examples/models/llama/README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -237,17 +237,17 @@ You can export and run the original Llama 3 8B instruct model.
237237

238238
2. Export model and generate `.pte` file
239239
```
240-
python -m examples.models.llama.export_llama
241-
--checkpoint <consolidated.00.pth>
242-
-p <params.json>
243-
-kv
244-
--use_sdpa_with_kv_cache
245-
-X
246-
-qmode 8da4w
247-
--group_size 128
248-
-d fp32
249-
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
250-
--embedding-quantize 4,32
240+
python -m examples.models.llama.export_llama \
241+
--checkpoint <consolidated.00.pth> \
242+
-p <params.json> \
243+
-kv \
244+
--use_sdpa_with_kv_cache \
245+
-X \
246+
-qmode 8da4w \
247+
--group_size 128 \
248+
-d fp32 \
249+
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
250+
--embedding-quantize 4,32 \
251251
--output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
252252
```
253253
Due to the larger vocabulary size of Llama 3, we recommend quantizing the embeddings with `--embedding-quantize 4,32` as shown above to further reduce the model size.

examples/models/llama/export_llama_lib.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@
7979
verbosity_setting = None
8080

8181

82-
EXECUTORCH_DEFINED_MODELS = ["stories110m", "llama2", "llama3", "llama3.1", "llama3.2"]
82+
EXECUTORCH_DEFINED_MODELS = ["stories110m", "llama2", "llama3", "llama3_1", "llama3_2"]
8383
TORCHTUNE_DEFINED_MODELS = []
8484

8585

@@ -915,7 +915,7 @@ def _get_source_transforms( # noqa
915915
ops that is not quantized.
916916
917917
There are cases where this may be a no-op, namely, if all linears are
918-
quantizedpp in the checkpoint.
918+
quantized in the checkpoint.
919919
"""
920920
modelname = f"{modelname}_q"
921921
transforms.append(

0 commit comments

Comments
 (0)