Skip to content

Commit 9b6d4b4

Browse files
cmodi-metafacebook-github-bot
authored andcommitted
Update iOS XNNPack demo app docs for Llama 3.2 (#5641)
Summary: Pull Request resolved: #5641 Update iOS XNNPack demo app docs for Llama 3.2 Reviewed By: kirklandsign Differential Revision: D63264488 fbshipit-source-id: 47412ad5eaad77e52e85ef79945536e3ba72bbad
1 parent 82a505b commit 9b6d4b4

File tree

2 files changed

+31
-0
lines changed

2 files changed

+31
-0
lines changed

examples/demo-apps/apple_ios/LLaMA/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ The goal is for you to see the type of support ExecuTorch provides and feel comf
1717
## Supported Models
1818

1919
As a whole, the models that this app supports are (varies by delegate):
20+
* Llama 3.2 1B/3B
2021
* Llama 3.1 8B
2122
* Llama 3 8B
2223
* Llama 2 7B

examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Building Llama iOS Demo for XNNPack Backend
22

3+
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-3.2-1b-and-3b-models) on the XNNPack backend. We currently support inference on their original data type (BFloat16).
4+
35
This tutorial covers the end to end workflow for building an iOS demo app using XNNPack backend on device.
46
More specifically, it covers:
57
1. Export and quantization of Llama models against the XNNPack backend.
@@ -45,11 +47,39 @@ Install the required packages to export the model
4547
sh examples/models/llama2/install_requirements.sh
4648
```
4749

50+
### For Llama 3.2 1B and 3B models
51+
We have supported BFloat16 as a data type on the XNNPack backend for Llama 3.2 1B/3B models.
52+
* You can download original model weights for Llama through Meta official [website](https://llama.meta.com/), or via Huggingface (Link to specific 3.2 1B repo)
53+
* For chat use-cases, download the instruct models instead of pretrained.
54+
* Run “examples/models/llama2/install_requirements.sh” to install dependencies.
55+
* The 1B model in BFloat16 format can run on mobile devices with 8GB RAM (iPhone 15 Pro and later). The 3B model will require 12GB+ RAM and hence will not fit on 8GB RAM phones.
56+
* Export Llama model and generate .pte file as below:
57+
58+
```
59+
python -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json> -kv -X -d bf16 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="llama3_2.pte"
60+
```
61+
62+
* Convert tokenizer for Llama 3.2 - Rename 'tokenizer.model' to 'tokenizer.bin'.
63+
64+
For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-).
65+
66+
### For Llama 3.1 and Llama 2 models
67+
4868
Export the model
4969
```
5070
python -m examples.models.llama2.export_llama --checkpoint <consolidated.00.pth> -p <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
5171
```
5272

73+
### For LLaVA model
74+
* For the Llava 1.5 model, you can get it from Huggingface [here](https://huggingface.co/llava-hf/llava-1.5-7b-hf).
75+
* Run `examples/models/llava/install_requirements.sh` to install dependencies.
76+
* Run the following command to generate llava.pte, tokenizer.bin and an image tensor (serialized in TorchScript) image.pt.
77+
78+
```
79+
python -m executorch.examples.models.llava.export_llava --pte-name llava.pte --with-artifacts
80+
```
81+
* You can find more information [here](https://github.com/pytorch/executorch/tree/main/examples/models/llava).
82+
5383
## Pushing Model and Tokenizer
5484

5585
### Copy the model to Simulator

0 commit comments

Comments
 (0)