diff --git a/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md b/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md index 41073848cda..c01d878fd65 100644 --- a/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md +++ b/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md @@ -71,8 +71,8 @@ cmake --build cmake-out -j16 --target install --config Release -### Setup Llama Runner -Next we need to build and compile the Llama runner. This is similar to the requirements for running Llama with XNNPack. +### Setup Llama Runner +Next we need to build and compile the Llama runner. This is similar to the requirements for running Llama with XNNPACK. ``` sh examples/models/llama2/install_requirements.sh @@ -130,9 +130,9 @@ You may also wonder what the "--metadata" flag is doing. This flag helps export Convert tokenizer for Llama 2 ``` -python -m extension.llm.tokenizer.tokenizer -t -o tokenizer.bin +python -m extension.llm.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin ``` -Convert tokenizer for Llama 3 - Rename tokenizer.model to tokenizer.bin. +Rename tokenizer for Llama 3 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly. ### Export with Spinquant (Llama 3 8B only) diff --git a/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md b/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md index 902f9724841..536e202d334 100644 --- a/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md +++ b/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md @@ -1,10 +1,11 @@ -# Building ExecuTorch Android Demo App for Llama running XNNPack +# Building ExecuTorch Android Demo App for Llama/Llava running XNNPACK -**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPack backend. We currently support inference on their original data type (BFloat16). We have also added instructions to run [Llama Guard 1B models](#for-llama-guard-1b-models) on-device. +**[UPDATE - 09/25]** We have added support for running [Llama 3.2 models](#for-llama-32-1b-and-3b-models) on the XNNPACK backend. We currently support inference on their original data type (BFloat16). We have also added instructions to run [Llama Guard 1B models](#for-llama-guard-1b-models) on-device. + +This tutorial covers the end to end workflow for building an android demo app using CPU on device via XNNPACK framework. -This tutorial covers the end to end workflow for building an android demo app using CPU on device via XNNPack framework. More specifically, it covers: -1. Export and quantization of Llama and Llava models against the XNNPack backend. +1. Export and quantization of Llama and Llava models against the XNNPACK backend. 2. Building and linking libraries that are required to inference on-device for Android platform. 3. Building the Android demo app itself. @@ -59,10 +60,10 @@ Optional: Use the --pybind flag to install with pybindings. In this demo app, we support text-only inference with up-to-date Llama models and image reasoning inference with LLaVA 1.5. ### For Llama 3.2 1B and 3B models -We have supported BFloat16 as a data type on the XNNPack backend for Llama 3.2 1B/3B models. +We have supported BFloat16 as a data type on the XNNPACK backend for Llama 3.2 1B/3B models. * You can request and download model weights for Llama through Meta official [website](https://llama.meta.com/). * For chat use-cases, download the instruct models instead of pretrained. -* Run “examples/models/llama2/install_requirements.sh” to install dependencies. +* Run `examples/models/llama2/install_requirements.sh` to install dependencies. * The 1B model in BFloat16 format can run on mobile devices with 8GB RAM. The 3B model will require 12GB+ RAM. * Export Llama model and generate .pte file as below: @@ -70,7 +71,7 @@ We have supported BFloat16 as a data type on the XNNPack backend for Llama 3.2 1 python -m examples.models.llama2.export_llama --checkpoint --params -kv -X -d bf16 --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name="llama3_2.pte" ``` -* Convert tokenizer for Llama 3.2 - Rename 'tokenizer.model' to 'tokenizer.bin'. +* Rename tokenizer for Llama 3.2 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly. For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-). @@ -89,7 +90,7 @@ python -m examples.models.llama2.export_llama --checkpoint -o tokenizer.bin +python -m extension.llm.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin ``` -* Convert tokenizer for Llama 3 - Rename `tokenizer.model` to `tokenizer.bin`. +* Rename tokenizer for Llama 3.1 with command: `mv tokenizer.model tokenizer.bin`. We are updating the demo app to support tokenizer in original format directly. ### For LLaVA model