Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 30 additions & 19 deletions examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
# Building Llama iOS Demo for XNNPack Backend
# Building Llama iOS Demo for XNNPACK Backend

**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPack backend. We currently support inference on their original data type (BFloat16).
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPACK backend. We currently support inference on their original data type (BFloat16).

This tutorial covers the end to end workflow for building an iOS demo app using XNNPack backend on device.
More specifically, it covers:
1. Export and quantization of Llama models against the XNNPack backend.
2. Building and linking libraries that are required to inference on-device for iOS platform using XNNPack.
This tutorial covers the end to end workflow for building an iOS demo app using XNNPACK backend on device. More specifically, it covers:
1. Export and quantization of Llama models against the XNNPACK backend.
2. Building and linking libraries that are required to inference on-device for iOS platform using XNNPACK.
3. Building the iOS demo app itself.

## Prerequisites
* [Xcode 15](https://developer.apple.com/xcode)
* [iOS 17 SDK](https://developer.apple.com/ios)
* Set up your ExecuTorch repo and environment if you haven’t done so by following the [Setting up ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup) to set up the repo and dev environment:

## Setup ExecuTorch
In this section, we will need to set up the ExecuTorch repo first with Conda environment management. Make sure you have Conda available in your system (or follow the instructions to install it [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)). The commands below are running on Linux (CentOS).
Expand Down Expand Up @@ -48,7 +46,7 @@ sh examples/models/llama2/install_requirements.sh
```

### For Llama 3.2 1B and 3B models
We have supported BFloat16 as a data type on the XNNPack backend for Llama 3.2 1B/3B models.
We have supported BFloat16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models.
* You can download original model weights for Llama through Meta official [website](https://llama.meta.com/).
* For chat use-cases, download the instruct models instead of pretrained.
* Run “examples/models/llama2/install_requirements.sh” to install dependencies.
Expand All @@ -59,8 +57,6 @@ We have supported BFloat16 as a data type on the XNNPack backend for Llama 3.2 1
python -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json> -kv -X -d bf16 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="llama3_2.pte"
```

* Convert tokenizer for Llama 3.2 - Rename 'tokenizer.model' to 'tokenizer.bin'.

For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-).

### For Llama 3.1 and Llama 2 models
Expand All @@ -70,16 +66,16 @@ Export the model
python -m examples.models.llama2.export_llama --checkpoint <consolidated.00.pth> -p <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
```

## Pushing Model and Tokenizer
### For LLaVA model
* For the Llava 1.5 model, you can get it from Huggingface [here](https://huggingface.co/llava-hf/llava-1.5-7b-hf).
* Run `examples/models/llava/install_requirements.sh` to install dependencies.
* Run the following command to generate llava.pte, tokenizer.bin and an image tensor (serialized in TorchScript) image.pt.

### Copy the model to Simulator
* Drag&drop the model and tokenizer files onto the Simulator window and save them somewhere inside the iLLaMA folder.
* Pick the files in the app dialog, type a prompt and click the arrow-up button.
```
python -m executorch.examples.models.llava.export_llava --pte-name llava.pte --with-artifacts
```
* You can find more information [here](https://github.com/pytorch/executorch/tree/main/examples/models/llava).

### Copy the model to Device
* Wire-connect the device and open the contents in Finder.
* Navigate to the Files tab and drag & drop the model and tokenizer files onto the iLLaMA folder.
* Wait until the files are copied.

## Configure the XCode Project

Expand Down Expand Up @@ -124,12 +120,27 @@ Then select which ExecuTorch framework should link against which target.
<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_demo_app_choosing_package.png" alt="iOS LLaMA App Choosing package" style="width:600px">
</p>

Click “Run” to build the app and run in on your iPhone. If the app successfully run on your device, you should see something like below:
Click “Run” to build the app and run in on your iPhone.

## Pushing Model and Tokenizer

### Copy the model to Simulator
* Drag&drop the model and tokenizer files onto the Simulator window and save them somewhere inside the iLLaMA folder.
* Pick the files in the app dialog, type a prompt and click the arrow-up button.

### Copy the model to Device
* Wire-connect the device and open the contents in Finder.
* Navigate to the Files tab and drag & drop the model and tokenizer files onto the iLLaMA folder.
* Wait until the files are copied.

Open the iLLaMA app, click the settings button at the top left of the app to select the model and tokenizer files. When the app successfully runs on your device, you should see something like below:

<p align="center">
<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/ios_demo_app.jpg" alt="iOS LLaMA App" style="width:300px">
</p>



For Llava 1.5 models, you can select and image (via image/camera selector button) before typing prompt and send button.

<p align="center">
Expand Down