Skip to content

Commit bb86667

Browse files
pareenavermapareenaverma
authored andcommitted
Updates to Executorch 0.7 Kleidi LP
1 parent 12dd6e1 commit bb86667

File tree

3 files changed

+22
-15
lines changed

3 files changed

+22
-15
lines changed

content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/2-executorch-setup.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ The best practice is to generate an isolated Python environment in which to inst
1515
### Option 1: Create a Python virtual environment
1616

1717
```bash
18-
python3.10 -m venv executorch
19-
source executorch/bin/activate
18+
python3.10 -m venv executorch-venv
19+
source executorch-venv/bin/activate
2020
```
2121

2222
The prompt of your terminal has `executorch` as a prefix to indicate the virtual environment is active.
@@ -28,8 +28,8 @@ Install Miniconda on your development machine by following the [Installing conda
2828
Once `conda` is installed, create the environment:
2929

3030
```bash
31-
conda create -yn executorch python=3.10.0
32-
conda activate executorch
31+
conda create -yn executorch-venv python=3.10.0
32+
conda activate executorch-venv
3333
```
3434

3535
### Clone ExecuTorch and install the required dependencies
@@ -40,7 +40,7 @@ From within the conda environment, run the commands below to download the ExecuT
4040
git clone https://github.com/pytorch/executorch.git
4141
cd executorch
4242
git submodule sync
43-
git submodule update --init
43+
git submodule update --init --recursive
4444
./install_executorch.sh
4545
./examples/models/llama/install_requirements.sh
4646
```

content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/4-prepare-llama-models.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,8 @@ python3 -m examples.models.llama.export_llama \
4646
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001, 128006, 128007]}' \
4747
--embedding-quantize 4,32 \
4848
--output_name="llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte" \
49-
--max_seq_length 1024
49+
--max_seq_length 1024 \
50+
--max_context_length 1024
5051
```
5152

5253
Due to the larger vocabulary size of Llama 3, you should quantize the embeddings with `--embedding-quantize 4,32` to further reduce the model size.

content/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/5-run-benchmark-on-android.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -38,18 +38,23 @@ cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
3838
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
3939
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
4040
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
41+
-DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \
4142
-DEXECUTORCH_BUILD_XNNPACK=ON \
4243
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
4344
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
4445
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
46+
-DEXECUTORCH_BUILD_KERNELS_LLM=ON \
47+
-DEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER=ON \
48+
-DEXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL=ON \
4549
-DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON \
4650
-DXNNPACK_ENABLE_ARM_BF16=OFF \
51+
-DBUILD_TESTING=OFF \
4752
-Bcmake-out-android .
4853

4954
cmake --build cmake-out-android -j7 --target install --config Release
5055
```
5156
{{% notice Note %}}
52-
Make sure you add -DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON option to enable support for KleidiAI kernels in ExecuTorch with XNNPack.
57+
Starting with Executorch version 0.7 beta, KleidiAI is enabled by default. The -DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON option is enabled and adds default support for KleidiAI kernels in ExecuTorch with XNNPack.
5358
{{% /notice %}}
5459

5560
### 3. Build Llama runner for Android
@@ -67,7 +72,8 @@ cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
6772
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
6873
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
6974
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
70-
-DEXECUTORCH_USE_TIKTOKEN=ON \
75+
-DSUPPORT_REGEX_LOOKAHEAD=ON \
76+
-DBUILD_TESTING=OFF \
7177
-Bcmake-out-android/examples/models/llama \
7278
examples/models/llama
7379

@@ -144,13 +150,13 @@ Reached to the end of generation
144150
145151
I 00:00:05.399314 executorch:runner.cpp:257] RSS after finishing text generation: 1269.445312 MiB (0 if unsupported)
146152
PyTorchObserver {"prompt_tokens":54,"generated_tokens":51,"model_load_start_ms":1710296339487,"model_load_end_ms":1710296343047,"inference_start_ms":1710296343370,"inference_end_ms":1710296344877,"prompt_eval_end_ms":1710296343556,"first_token_ms":1710296343556,"aggregate_sampling_time_ms":49,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
147-
I 00:00:05.399342 executorch:stats.h:111] Prompt Tokens: 54 Generated Tokens: 51
148-
I 00:00:05.399344 executorch:stats.h:117] Model Load Time: 3.560000 (seconds)
149-
I 00:00:05.399346 executorch:stats.h:127] Total inference time: 1.507000 (seconds) Rate: 33.842070 (tokens/second)
150-
I 00:00:05.399348 executorch:stats.h:135] Prompt evaluation: 0.186000 (seconds) Rate: 290.322581 (tokens/second)
151-
I 00:00:05.399350 executorch:stats.h:146] Generated 51 tokens: 1.321000 (seconds) Rate: 38.607116 (tokens/second)
152-
I 00:00:05.399352 executorch:stats.h:154] Time to first generated token: 0.186000 (seconds)
153-
I 00:00:05.399354 executorch:stats.h:161] Sampling time over 105 tokens: 0.049000 (seconds)
153+
I 00:00:04.530945 executorch:stats.h:108] Prompt Tokens: 54 Generated Tokens: 69
154+
I 00:00:04.530947 executorch:stats.h:114] Model Load Time: 1.196000 (seconds)
155+
I 00:00:04.530949 executorch:stats.h:124] Total inference time: 1.934000 (seconds) Rate: 35.677353 (tokens/second)
156+
I 00:00:04.530952 executorch:stats.h:132] Prompt evaluation: 0.176000 (seconds) Rate: 306.818182 (tokens/second)
157+
I 00:00:04.530954 executorch:stats.h:143] Generated 69 tokens: 1.758000 (seconds) Rate: 39.249147 (tokens/second)
158+
I 00:00:04.530956 executorch:stats.h:151] Time to first generated token: 0.176000 (seconds)
159+
I 00:00:04.530959 executorch:stats.h:158] Sampling time over 123 tokens: 0.067000 (seconds)
154160
```
155161
156162
You have successfully run the Llama 3.1 1B Instruct model on your Android smartphone with ExecuTorch using KleidiAI kernels.

0 commit comments

Comments
 (0)