Skip to content

Commit 6a52f58

Browse files
Riandypytorchbot
authored andcommitted
Readme docs update (#5695)
Summary: Pull Request resolved: #5695 - rename XNNPACK - remove duplicate executorch setup instructions - remove tokenizer conversion step (since iOS supports both .bin and .model) - move model copying section to after xcode setup Reviewed By: cmodi-meta, kirklandsign Differential Revision: D63479985 fbshipit-source-id: bd1030588fa997f26c0c8da59f5850b06141aa43 (cherry picked from commit 7e9eaa8)
1 parent 8d16c52 commit 6a52f58

File tree

1 file changed

+46
-14
lines changed

1 file changed

+46
-14
lines changed

examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md

Lines changed: 46 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
1-
# Building Llama iOS Demo for XNNPack Backend
1+
# Building Llama iOS Demo for XNNPACK Backend
22

3-
This tutorial covers the end to end workflow for building an iOS demo app using XNNPack backend on device.
3+
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPACK backend. We currently support inference on their original data type (BFloat16).
4+
5+
This tutorial covers the end to end workflow for building an iOS demo app using XNNPACK backend on device.
46
More specifically, it covers:
5-
1. Export and quantization of Llama models against the XNNPack backend.
6-
2. Building and linking libraries that are required to inference on-device for iOS platform using XNNPack.
7+
1. Export and quantization of Llama models against the XNNPACK backend.
8+
2. Building and linking libraries that are required to inference on-device for iOS platform using XNNPACK.
79
3. Building the iOS demo app itself.
810

911
## Prerequisites
1012
* [Xcode 15](https://developer.apple.com/xcode)
1113
* [iOS 17 SDK](https://developer.apple.com/ios)
12-
* Set up your ExecuTorch repo and environment if you haven’t done so by following the [Setting up ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup) to set up the repo and dev environment:
1314

1415
## Setup ExecuTorch
1516
In this section, we will need to set up the ExecuTorch repo first with Conda environment management. Make sure you have Conda available in your system (or follow the instructions to install it [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)). The commands below are running on Linux (CentOS).
@@ -45,21 +46,37 @@ Install the required packages to export the model
4546
sh examples/models/llama2/install_requirements.sh
4647
```
4748

49+
### For Llama 3.2 1B and 3B models
50+
We have supported BFloat16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models.
51+
* You can download original model weights for Llama through Meta official [website](https://llama.meta.com/).
52+
* For chat use-cases, download the instruct models instead of pretrained.
53+
* Run “examples/models/llama2/install_requirements.sh” to install dependencies.
54+
* The 1B model in BFloat16 format can run on mobile devices with 8GB RAM (iPhone 15 Pro and later). The 3B model will require 12GB+ RAM and hence will not fit on 8GB RAM phones.
55+
* Export Llama model and generate .pte file as below:
56+
57+
```
58+
python -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json> -kv -X -d bf16 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="llama3_2.pte"
59+
```
60+
61+
For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-).
62+
63+
### For Llama 3.1 and Llama 2 models
64+
4865
Export the model
4966
```
5067
python -m examples.models.llama2.export_llama --checkpoint <consolidated.00.pth> -p <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
5168
```
5269

53-
## Pushing Model and Tokenizer
70+
### For LLaVA model
71+
* For the Llava 1.5 model, you can get it from Huggingface [here](https://huggingface.co/llava-hf/llava-1.5-7b-hf).
72+
* Run `examples/models/llava/install_requirements.sh` to install dependencies.
73+
* Run the following command to generate llava.pte, tokenizer.bin and an image tensor (serialized in TorchScript) image.pt.
5474

55-
### Copy the model to Simulator
56-
* Drag&drop the model and tokenizer files onto the Simulator window and save them somewhere inside the iLLaMA folder.
57-
* Pick the files in the app dialog, type a prompt and click the arrow-up button.
75+
```
76+
python -m executorch.examples.models.llava.export_llava --pte-name llava.pte --with-artifacts
77+
```
78+
* You can find more information [here](https://github.com/pytorch/executorch/tree/main/examples/models/llava).
5879

59-
### Copy the model to Device
60-
* Wire-connect the device and open the contents in Finder.
61-
* Navigate to the Files tab and drag & drop the model and tokenizer files onto the iLLaMA folder.
62-
* Wait until the files are copied.
6380

6481
## Configure the XCode Project
6582

@@ -104,12 +121,27 @@ Then select which ExecuTorch framework should link against which target.
104121
<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_demo_app_choosing_package.png" alt="iOS LLaMA App Choosing package" style="width:600px">
105122
</p>
106123

107-
Click “Run” to build the app and run in on your iPhone. If the app successfully run on your device, you should see something like below:
124+
Click “Run” to build the app and run in on your iPhone.
125+
126+
## Pushing Model and Tokenizer
127+
128+
### Copy the model to Simulator
129+
* Drag&drop the model and tokenizer files onto the Simulator window and save them somewhere inside the iLLaMA folder.
130+
* Pick the files in the app dialog, type a prompt and click the arrow-up button.
131+
132+
### Copy the model to Device
133+
* Wire-connect the device and open the contents in Finder.
134+
* Navigate to the Files tab and drag & drop the model and tokenizer files onto the iLLaMA folder.
135+
* Wait until the files are copied.
136+
137+
Open the iLLaMA app, click the settings button at the top left of the app to select the model and tokenizer files. When the app successfully runs on your device, you should see something like below:
108138

109139
<p align="center">
110140
<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_demo_app.jpg" alt="iOS LLaMA App" style="width:300px">
111141
</p>
112142

143+
144+
113145
For Llava 1.5 models, you can select and image (via image/camera selector button) before typing prompt and send button.
114146

115147
<p align="center">

0 commit comments

Comments
 (0)