@@ -164,7 +164,7 @@ Llama 3 8B performance was measured on the Samsung Galaxy S22, S24, and OnePlus
164164```
165165# No quantization
166166# Set these paths to point to the downloaded files
167- LLAMA_CHECKPOINT=path/to/checkpoint .pth
167+ LLAMA_CHECKPOINT=path/to/consolidated.00 .pth
168168LLAMA_PARAMS=path/to/params.json
169169
170170python -m examples.models.llama.export_llama \
@@ -186,7 +186,7 @@ For convenience, an [exported ExecuTorch bf16 model](https://huggingface.co/exec
186186```
187187# SpinQuant
188188# Set these paths to point to the exported files
189- LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/checkpoint .pth
189+ LLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/consolidated.00.pth .pth
190190LLAMA_PARAMS=path/to/spinquant/params.json
191191
192192python -m examples.models.llama.export_llama \
@@ -215,7 +215,7 @@ For convenience, an [exported ExecuTorch SpinQuant model](https://huggingface.co
215215```
216216# QAT+LoRA
217217# Set these paths to point to the exported files
218- LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/checkpoint .pth
218+ LLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/consolidated.00.pth .pth
219219LLAMA_PARAMS=path/to/qlora/params.json
220220
221221python -m examples.models.llama.export_llama \
@@ -248,7 +248,7 @@ You can export and run the original Llama 3 8B instruct model.
2482482 . Export model and generate ` .pte ` file
249249 ```
250250 python -m examples.models.llama.export_llama \
251- --checkpoint <consolidated.00.pth> \
251+ --checkpoint <consolidated.00.pth.pth > \
252252 -p <params.json> \
253253 -kv \
254254 --use_sdpa_with_kv_cache \
@@ -396,7 +396,7 @@ First export your model for lowbit quantization (step 2 above):
396396
397397```
398398# Set these paths to point to the downloaded files
399- LLAMA_CHECKPOINT=path/to/checkpoint .pth
399+ LLAMA_CHECKPOINT=path/to/consolidated.00.pth .pth
400400LLAMA_PARAMS=path/to/params.json
401401
402402# Set low-bit quantization parameters
@@ -476,7 +476,7 @@ We use [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness) to evaluat
476476For base models, use the following example command to calculate its perplexity based on WikiText.
477477```
478478python -m examples.models.llama.eval_llama \
479- -c <checkpoint .pth> \
479+ -c <consolidated.00.pth .pth> \
480480 -p <params.json> \
481481 -t <tokenizer.model/bin> \
482482 -kv \
@@ -489,7 +489,7 @@ python -m examples.models.llama.eval_llama \
489489For instruct models, use the following example command to calculate its MMLU score.
490490```
491491python -m examples.models.llama.eval_llama \
492- -c <checkpoint .pth> \
492+ -c <consolidated.00.pth .pth> \
493493 -p <params.json> \
494494 -t <tokenizer.model/bin> \
495495 -kv \
0 commit comments