You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/smartphones-and-mobile/Build-Llama3-Chat-Android-App-Using-Executorch-And-XNNPACK/1-dev-env-setup.md
+19-17Lines changed: 19 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,23 +8,23 @@ layout: learningpathall
8
8
9
9
## Set up your development environment
10
10
11
-
In this Learning Path, you will learn how to build and deploy a simple LLM-based chat app to an Android device using ExecuTorch and XNNPACK. You will learn how to build the ExecuTorch runtime for Llama models, build JNI libraries for the Android application, and use the libraries in the application.
11
+
In this Learning Path, you will learn how to build and deploy a simple LLM-based chat app to an Android device using ExecuTorch and XNNPACK with [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai). Arm has worked with the Meta team to integrate KleidiAI into ExecuTorch through XNNPACK. These improvements increase the throughput of quantized LLMs running on Arm chips that contain the i8mm (8-bit integer matrix multiply) processor feature. You will learn how to build the ExecuTorch runtime for Llama models with KleidiAI, build JNI libraries for the Android application, and use the libraries in the application.
12
12
13
13
The first step is to prepare a development environment with the required software:
14
14
15
15
- Android Studio (latest version recommended).
16
-
- Android NDK version 25.0.8775105.
16
+
- Android NDK version 28.0.12433566.
17
17
- Java 17 JDK.
18
18
- Git.
19
-
- Python 3.10.
19
+
- Python 3.10 or later (these instructions have been tested with 3.10 and 3.12)
20
20
21
21
The instructions assume macOS with Apple Silicon, an x86 Debian, or Ubuntu Linux machine with at least 16GB of RAM.
22
22
23
23
## Install Android Studio and Android NDK
24
24
25
25
Follow these steps to install and configure Android Studio:
26
26
27
-
1. Download and install the latest version of [Android Studio](https://developer.android.com/studio/).
27
+
1. Download and install the latest version of [Android Studio](https://developer.android.com/studio/).
28
28
29
29
2. Start Android Studio and open the `Settings` dialog.
Install the NDK in the directory that Android Studio installed the SDK. This is generally `~/Library/Android/sdk` by default:
55
+
Install the NDK in the directory that Android Studio installed the SDK. This is generally `~/Library/Android/sdk` by default. Set the requirement environment variables:
Copy file name to clipboardExpand all lines: content/learning-paths/smartphones-and-mobile/Build-Llama3-Chat-Android-App-Using-Executorch-And-XNNPACK/2-executorch-setup.md
+4-11Lines changed: 4 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,23 +34,16 @@ conda activate executorch
34
34
35
35
### Clone ExecuTorch and install the required dependencies
36
36
37
-
From within the conda environment, run the commands below to download the ExecuTorch repository and install the required packages:
37
+
From within the conda environment, run the commands below to download the ExecuTorch repository and install the required packages:
Copy file name to clipboardExpand all lines: content/learning-paths/smartphones-and-mobile/Build-Llama3-Chat-Android-App-Using-Executorch-And-XNNPACK/3-Understanding-LLaMA-models.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,11 +30,11 @@ As Llama 2 and Llama 3 models require at least 4-bit quantization due to the con
30
30
31
31
## Quantization
32
32
33
-
One way to create models that fit in smartphone memory is to employ 4-bit groupwise per token dynamic quantization of all the linear layers of the model. *Dynamic quantization* refers to quantizing activations dynamically, such that quantization parameters for activations are calculated, from the min/max range, at runtime. Furthermore, weights are statically quantized. In this case, weights are per-channel groupwise quantized with 4-bit signed integers.
33
+
One way to create models that fit in smartphone memory is to employ 4-bit groupwise per token dynamic quantization of all the linear layers of the model. *Dynamic quantization* refers to quantizing activations dynamically, such that quantization parameters for activations are calculated, from the min/max range, at runtime. Furthermore, weights are statically quantized. In this case, weights are per-channel groupwise quantized with 4-bit signed integers.
34
34
35
35
For further information, refer to [torchao: PyTorch Architecture Optimization](https://github.com/pytorch-labs/ao/).
36
36
37
-
The table below evaluates WikiText perplexity using [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness).
37
+
The table below evaluates WikiText perplexity using [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness).
38
38
39
39
The results are for two different groupsizes, with max_seq_len 2048, and 1000 samples:
40
40
@@ -43,9 +43,9 @@ The results are for two different groupsizes, with max_seq_len 2048, and 1000 sa
43
43
|Llama 2 7B | 9.2 | 10.2 | 10.7
44
44
|Llama 3 8B | 7.9 | 9.4 | 9.7
45
45
46
-
Note that groupsize less than 128 was not enabled, since such a model was still too large. This is because current efforts have focused on enabling FP32, and support for FP16 is under way.
46
+
Note that groupsize less than 128 was not enabled, since such a model was still too large. This is because current efforts have focused on enabling FP32, and support for FP16 is under way.
Copy file name to clipboardExpand all lines: content/learning-paths/smartphones-and-mobile/Build-Llama3-Chat-Android-App-Using-Executorch-And-XNNPACK/4-Prepare-LLaMA-models.md
+25-98Lines changed: 25 additions & 98 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,121 +6,48 @@ weight: 5
6
6
layout: learningpathall
7
7
---
8
8
9
-
## Download and export the Llama 3 8B model
9
+
## Download and export the Llama 3.2 1B model
10
10
11
-
To get started with Llama 3, you obtain the pre-trained parameters by visiting [Meta's Llama Downloads](https://llama.meta.com/llama-downloads/) page. Request the access by filling out your details and read through and accept the Responsible Use Guide. This grants you a license and a download link which is valid for 24 hours. The Llama 3 8B model is used for this part, but the same instructions apply for other options as well with minimal modification.
11
+
To get started with Llama 3, you obtain the pre-trained parameters by visiting [Meta's Llama Downloads](https://llama.meta.com/llama-downloads/) page. Request the access by filling out your details and read through and accept the Responsible Use Guide. This grants you a license and a download link which is valid for 24 hours. The Llama 3.2 1B model is used for this part, but the same instructions apply for other options as well with minimal modification.
12
12
13
-
Install the following requirements using a package manager of your choice, for example apt-get:
13
+
Install the `llama-stack`package from `pip`.
14
14
```bash
15
-
apt-get install md5sum wget
15
+
pip install llama-stack
16
16
```
17
-
18
-
Clone the Llama models Git repository and install the dependencies:
1. If you encounter the error "Sorry, we could not process your request at this moment", it might mean you have initiated two license processes simultaneously. Try modifying the affiliation field to work around it.
49
-
2. You may have to run the `download.sh` script as root, or modify the execution privileges with `chmod`.
34
+
{{% notice Working Directory %}}
35
+
The rest of the instructions should be executed from the ExecuTorch base directory.
50
36
{{% /notice %}}
51
37
52
-
Export model and generate `.pte` file. Run the Python command to export the model:
38
+
Export model and generate `.pte` file. Run the Python command to export the model to your current directory.
0 commit comments