Skip to content

Commit 2d0382e

Browse files
pytorchbotChester Hucmodi-meta
authored
Document update (#5707)
Document update (#5692) Summary: Pull Request resolved: #5692 1. All caps for xnnpack. 2. Provide command to rename tokenizer file. 3. Other format fixes. Reviewed By: kirklandsign Differential Revision: D63477936 fbshipit-source-id: 9dd63d132f0b811fa9bb6ca7b616aa56fb503830 (cherry picked from commit ff6607e) Co-authored-by: Chester Hu <[email protected]> Co-authored-by: Chirag Modi <[email protected]>
1 parent e18bf6f commit 2d0382e

File tree

2 files changed

+15
-14
lines changed

2 files changed

+15
-14
lines changed

examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,8 @@ cmake --build cmake-out -j16 --target install --config Release
7171

7272

7373

74-
### Setup Llama Runner
75-
Next we need to build and compile the Llama runner. This is similar to the requirements for running Llama with XNNPack.
74+
### Setup Llama Runner
75+
Next we need to build and compile the Llama runner. This is similar to the requirements for running Llama with XNNPACK.
7676
```
7777
sh examples/models/llama2/install_requirements.sh
7878
@@ -130,9 +130,9 @@ You may also wonder what the "--metadata" flag is doing. This flag helps export
130130

131131
Convert tokenizer for Llama 2
132132
```
133-
python -m extension.llm.tokenizer.tokenizer -t <tokenizer.model> -o tokenizer.bin
133+
python -m extension.llm.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin
134134
```
135-
Convert tokenizer for Llama 3 - Rename tokenizer.model to tokenizer.bin.
135+
Rename tokenizer for Llama 3 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
136136

137137

138138
### Export with Spinquant (Llama 3 8B only)

examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
# Building ExecuTorch Android Demo App for Llama running XNNPack
1+
# Building ExecuTorch Android Demo App for Llama/Llava running XNNPACK
22

3-
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPack backend. We currently support inference on their original data type (BFloat16). We have also added instructions to run [Llama Guard 1B models](#for-llama-guard-1b-models) on-device.
3+
**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPACK backend. We currently support inference on their original data type (BFloat16). We have also added instructions to run [Llama Guard 1B models](#for-llama-guard-1b-models) on-device.
4+
5+
This tutorial covers the end to end workflow for building an android demo app using CPU on device via XNNPACK framework.
46

5-
This tutorial covers the end to end workflow for building an android demo app using CPU on device via XNNPack framework.
67
More specifically, it covers:
7-
1. Export and quantization of Llama and Llava models against the XNNPack backend.
8+
1. Export and quantization of Llama and Llava models against the XNNPACK backend.
89
2. Building and linking libraries that are required to inference on-device for Android platform.
910
3. Building the Android demo app itself.
1011

@@ -59,18 +60,18 @@ Optional: Use the --pybind flag to install with pybindings.
5960
In this demo app, we support text-only inference with up-to-date Llama models and image reasoning inference with LLaVA 1.5.
6061

6162
### For Llama 3.2 1B and 3B models
62-
We have supported BFloat16 as a data type on the XNNPack backend for Llama 3.2 1B/3B models.
63+
We have supported BFloat16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models.
6364
* You can request and download model weights for Llama through Meta official [website](https://llama.meta.com/).
6465
* For chat use-cases, download the instruct models instead of pretrained.
65-
* Run examples/models/llama2/install_requirements.sh to install dependencies.
66+
* Run `examples/models/llama2/install_requirements.sh` to install dependencies.
6667
* The 1B model in BFloat16 format can run on mobile devices with 8GB RAM. The 3B model will require 12GB+ RAM.
6768
* Export Llama model and generate .pte file as below:
6869

6970
```
7071
python -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json> -kv -X -d bf16 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="llama3_2.pte"
7172
```
7273

73-
* Convert tokenizer for Llama 3.2 - Rename 'tokenizer.model' to 'tokenizer.bin'.
74+
* Rename tokenizer for Llama 3.2 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
7475

7576
For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-).
7677

@@ -89,7 +90,7 @@ python -m examples.models.llama2.export_llama --checkpoint <pruned llama guard 1
8990

9091

9192
### For Llama 3.1 and Llama 2 models
92-
* You can download original model weights for Llama through Meta official [website](https://llama.meta.com/), or via Huggingface ([Llama 3.1 8B Instruction](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct))
93+
* You can download original model weights for Llama through Meta official [website](https://llama.meta.com/).
9394
* For Llama 2 models, Edit params.json file. Replace "vocab_size": -1 with "vocab_size": 32000. This is a short-term workaround
9495
* Run `examples/models/llama2/install_requirements.sh` to install dependencies.
9596
* The Llama 3.1 and Llama 2 models (8B and 7B) can run on devices with 12GB+ RAM.
@@ -103,9 +104,9 @@ You may wonder what the ‘--metadata’ flag is doing. This flag helps export t
103104

104105
* Convert tokenizer for Llama 2
105106
```
106-
python -m extension.llm.tokenizer.tokenizer -t <tokenizer.model> -o tokenizer.bin
107+
python -m extension.llm.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin
107108
```
108-
* Convert tokenizer for Llama 3 - Rename `tokenizer.model` to `tokenizer.bin`.
109+
* Rename tokenizer for Llama 3.1 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly.
109110

110111

111112
### For LLaVA model

0 commit comments

Comments
 (0)