You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #5692
1. All caps for xnnpack.
2. Provide command to rename tokenizer file.
3. Other format fixes.
Reviewed By: kirklandsign
Differential Revision: D63477936
fbshipit-source-id: 9dd63d132f0b811fa9bb6ca7b616aa56fb503830
Convert tokenizer for Llama 3 - Rename tokenizer.model to tokenizer.bin.
135
+
Rename tokenizer for Llama 3 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
Copy file name to clipboardExpand all lines: examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
-
# Building ExecuTorch Android Demo App for Llama/Llava running XNNPack
1
+
# Building ExecuTorch Android Demo App for Llama/Llava running XNNPACK
2
2
3
-
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPack backend. We currently support inference on their original data type (BFloat16). We have also added instructions to run [Llama Guard 1B models](#for-llama-guard-1b-models) on-device.
3
+
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPACK backend. We currently support inference on their original data type (BFloat16). We have also added instructions to run [Llama Guard 1B models](#for-llama-guard-1b-models) on-device.
4
4
5
-
This tutorial covers the end to end workflow for building an android demo app using CPU on device via XNNPack framework.
5
+
This tutorial covers the end to end workflow for building an android demo app using CPU on device via XNNPACK framework.
6
6
More specifically, it covers:
7
-
1. Export and quantization of Llama and Llava models against the XNNPack backend.
7
+
1. Export and quantization of Llama and Llava models against the XNNPACK backend.
8
8
2. Building and linking libraries that are required to inference on-device for Android platform.
9
9
3. Building the Android demo app itself.
10
10
@@ -59,18 +59,18 @@ Optional: Use the --pybind flag to install with pybindings.
59
59
In this demo app, we support text-only inference with up-to-date Llama models and image reasoning inference with LLaVA 1.5.
60
60
61
61
### For Llama 3.2 1B and 3B models
62
-
We have supported BFloat16 as a data type on the XNNPack backend for Llama 3.2 1B/3B models.
62
+
We have supported BFloat16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models.
63
63
* You can request and download model weights for Llama through Meta official [website](https://llama.meta.com/).
64
64
* For chat use-cases, download the instruct models instead of pretrained.
65
-
* Run “examples/models/llama2/install_requirements.sh” to install dependencies.
65
+
* Run `examples/models/llama2/install_requirements.sh` to install dependencies.
66
66
* The 1B model in BFloat16 format can run on mobile devices with 8GB RAM. The 3B model will require 12GB+ RAM.
67
67
* Export Llama model and generate .pte file as below:
*Convert tokenizer for Llama 3.2 - Rename 'tokenizer.model' to 'tokenizer.bin'.
73
+
*Rename tokenizer for Llama 3.2 with command: `mv tokenizer.modeltokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
74
74
75
75
For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-).
* You can download original model weights for Llama through Meta official [website](https://llama.meta.com/), or via Huggingface ([Llama 3.1 8B Instruction](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct))
92
+
* You can download original model weights for Llama through Meta official [website](https://llama.meta.com/).
93
93
* For Llama 2 models, Edit params.json file. Replace "vocab_size": -1 with "vocab_size": 32000. This is a short-term workaround
94
94
* Run `examples/models/llama2/install_requirements.sh` to install dependencies.
95
95
* The Llama 3.1 and Llama 2 models (8B and 7B) can run on devices with 12GB+ RAM.
@@ -103,9 +103,9 @@ You may wonder what the ‘--metadata’ flag is doing. This flag helps export t
103
103
104
104
* Convert tokenizer for Llama 2 and Llava (skip this for Llama 3.x)
*Convert tokenizer for Llama 3 - Rename `tokenizer.model` to `tokenizer.bin`.
108
+
*Rename tokenizer for Llama 3.1 with command: `mv tokenizer.modeltokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
0 commit comments