Skip to content

Commit fd8c2b3

Browse files
Merge pull request #1743 from jasonrandrews/review
Merge 2 fine tuning Learning Paths and keep them in draft until techn…
2 parents 45424a8 + 1c8e5ee commit fd8c2b3

File tree

4 files changed

+202
-2
lines changed

4 files changed

+202
-2
lines changed

content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/_index.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
2-
title: LLM fine-tuning for web applications
2+
title: LLM fine-tuning for web and mobile applications
33

44
draft: true
55
cascade:
66
draft: true
77

88
minutes_to_complete: 60
99

10-
who_is_this_for: This is an introductory topic for developers and data scientists new to fine-tuning large language models (LLMs) and looking to develop a fine-tuned LLM for web applications.
10+
who_is_this_for: This is an introductory topic for developers and data scientists new to fine-tuning large language models (LLMs) and looking to develop a fine-tuned LLM for web and mobile applications.
1111

1212
learning_objectives:
1313
- Learn the basics of large language models (LLMs) and how fine-tuning enhances model performance for specific use cases.
@@ -16,9 +16,13 @@ learning_objectives:
1616
- Learn how to curate, clean, and preprocess domain-specific datasets for optimal fine-tuning.
1717
- Understand dataset formats, tokenization, and annotation techniques for improving model learning.
1818
- Implement fine-tuning with frameworks like Hugging Face Transformers and PyTorch.
19+
- Compile a Large Language Model (LLM) using ExecuTorch.
20+
- Learn how to deploy a fine-tuned model on a mobile device.
21+
- Describe techniques for running large language models in an mobile environment.
1922

2023
prerequisites:
2124
- An AWS Graviton4 instance. You can substitute any Arm based Linux computer. Refer to [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/) for more information about cloud service providers offering Arm-based instances.
25+
- An Android smartphone with the i8mm feature and 16GB of RAM.
2226
- Basic understanding of machine learning and deep learning.
2327
- Familiarity with deep learning frameworks such as PyTorch and Hugging Face Transformers.
2428

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
title: Mobile Plartform for Fine Tuning Large Language Model
3+
weight: 9
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Development environment
10+
You will learn to build the ExecuTorch runtime for fine-tuning models using KleidiAI, create JNI libraries for an mobile application, and integrate these libraries into the application.
11+
12+
The first step is to set up a development environment with the necessary software:
13+
- Python 3.10 or later
14+
- Git
15+
- Java 17 JDK
16+
- Latest Version of Android Studio
17+
- Android NDK
18+
19+
###### Installation of Android Studio and Android NDK
20+
- Download and install the latest version of Android Studio
21+
- Launch Android Studio and open the Settings dialog.
22+
- Go to Languages & Frameworks > Android SDK.
23+
- In the SDK Platforms tab, select Android 14.0 ("UpsideDownCake").
24+
- Install the required version of Android NDK by first setting up the Android command line tools.
25+
26+
###### Install Java 17 JDK
27+
- Open the [Java SE 17 Archive Downloads](https://www.oracle.com/java/technologies/javase/jdk17-archive-downloads.html) Downloads page in your browser.
28+
- Choose the appropriate version for your operating system.
29+
- Downloads are available for macOS and Linux.
30+
31+
###### Install Git and cmake
32+
33+
For macOS use [Homebrew](https://brew.sh/):
34+
35+
``` bash
36+
brew install git cmake
37+
```
38+
39+
For Linux, use the package manager for your distribution:
40+
41+
``` bash
42+
sudo apt install git-all cmake
43+
```
44+
45+
###### Install Python 3.10
46+
47+
For macOS:
48+
49+
``` bash
50+
brew install [email protected]
51+
```
52+
53+
For Linux:
54+
55+
``` bash
56+
sudo apt update
57+
sudo apt install software-properties-common -y
58+
sudo add-apt-repository ppa:deadsnakes/ppa
59+
sudo apt install Python3.10 python3.10-venv
60+
```
61+
62+
63+
###### Setup the [Executorch](https://pytorch.org/executorch/stable/intro-overview.html) Environments
64+
For mobile device execution, [ExecuTorch](https://pytorch.org/executorch/stable/intro-overview.html) is required. It enables efficient on-device model deployment and execution
65+
66+
- Python virtual environment creation
67+
68+
```bash
69+
python3.10 -m venv executorch
70+
source executorch/bin/activate
71+
```
72+
73+
The prompt of your terminal has `executorch` as a prefix to indicate the virtual environment is active.
74+
75+
- Conda virtual environment creation
76+
77+
Install Miniconda on your development machine by following the [Installing conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) instructions.
78+
79+
Once `conda` is installed, create the environment:
80+
81+
```bash
82+
conda create -yn executorch python=3.10.0
83+
conda activate executorch
84+
```
85+
86+
###### Clone ExecuTorch and install the required dependencies
87+
88+
From within the conda environment, run the commands below to download the ExecuTorch repository and install the required packages:
89+
90+
- You need to download Executorch from this [GitHub repository](https://github.com/pytorch/executorch/tree/main)
91+
- Download the executorch.aar file from [executorch.aar](https://ossci-android.s3.us-west-1.amazonaws.com/executorch/release/executorch-241002/executorch.aar )
92+
- Add a libs folder in this path \executorch-main\executorch-main\examples\demo-apps\android\LlamaDemo\app\libs and add executorch.aar
93+
94+
``` bash
95+
git submodule sync
96+
git submodule update --init
97+
./install_requirements.sh
98+
./install_requirements.sh --pybind xnnpack
99+
./examples/models/llama/install_requirements.sh
100+
```
101+
102+
###### Mobile Device Setup
103+
- Enable the mobile device in [Android Studio](https://support.google.com/android/community-guide/273205728/how-to-enable-developer-options-on-android-pixels-6-secret-android-tips?hl=en)
104+
- On the Android phone, enable Developer Options
105+
- First, navigate to Settings > About Phone.
106+
- At the bottom, locate Build Number and tap it seven times. A message will appear confirming that you are now a developer.(if only it were that easy to become one XD)
107+
- Access Developer Options by navigating to Settings > System > Developer Options.
108+
- You will see a large number of options, I repeat: DO NOT TOUCH ANYTHING YOU DO NOT KNOW.
109+
- Enable USB Debugging to connect your mobile device to Android Studio.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: Fine Tune Large Language Model and Quantization
3+
weight: 10
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
#### Llama Model
10+
Llama is a family of large language models designed for high-performance language processing tasks, trained using publicly available data. When fine-tuned, Llama-based models can be optimized for specific applications, enhancing their ability to generate accurate and context-aware responses. Fine-tuning enables the model to adapt to domain-specific data, improving performance in tasks such as:
11+
12+
- Language translation – Enhancing fluency and contextual accuracy.
13+
- Question answering – Providing precise and relevant responses.
14+
- Text summarization – Extracting key insights while maintaining coherence.
15+
16+
Fine-tuned LLaMA models are also highly effective in generating human-like text, making them valuable for:
17+
18+
- Chatbots – Enabling intelligent and context-aware interactions.
19+
- Virtual assistants – Enhancing responsiveness and personalization.
20+
- Creative writing – Generating compelling and structured narratives.
21+
22+
By fine-tuning Llama based models, their adaptability and relevance can be significantly improved, allowing seamless integration into specialized AI applications.Please note that the models are subject to the [acceptable use policy](https://github.com/facebookresearch/llama/blob/main/USE_POLICY.md) and [this responsible use guide](https://ai.meta.com/static-resource/responsible-use-guide/).
23+
24+
#### Results
25+
26+
Since LLaMA 2 and LLaMA 3 models require at least 4-bit quantization to accommodate the memory constraints of certain smartphones
27+
28+
#### Quantization
29+
30+
To optimize models for smartphone memory constraints, 4-bit groupwise per-token dynamic quantization can be applied to all linear layers. In this approach:
31+
32+
- Dynamic quantization is used for activations, where quantization parameters are computed at runtime based on the min/max range.
33+
- Static quantization is applied to weights, which are per-channel groupwise quantized using 4-bit signed integers.
34+
35+
This method ensures efficient memory usage while maintaining model performance on resource-constrained devices.
36+
37+
For further information, refer to [torchao: PyTorch Architecture Optimization](https://github.com/pytorch-labs/ao/).
38+
39+
The table below evaluates WikiText perplexity using [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness).
40+
41+
The results are for two different groupsizes, with max_seq_len 2048, and 1000 samples:
42+
43+
|Model | Baseline (FP32) | Groupwise 4-bit (128) | Groupwise 4-bit (256)
44+
|--------|-----------------| ---------------------- | ---------------
45+
|Llama 2 7B | 9.2 | 10.2 | 10.7
46+
|Llama 3 8B | 7.9 | 9.4 | 9.7
47+
48+
Note that groupsize less than 128 was not enabled in this example, since the model was still too large. This is because current efforts have focused on enabling FP32, and support for FP16 is under way.
49+
50+
What this implies for model size is:
51+
52+
1. Embedding table is in FP32.
53+
2. Quantized weights scales are FP32.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
title: Prepared the Fine Tune Large Language Model for ExecuTorch and Mobile Deployment
3+
weight: 11
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
#### Fine Tune Model Preparation
10+
11+
- From the [Huggingface](https://huggingface.co/) need to apply for Repo access [Meta's Llama 3.2 language models](https://huggingface.co/meta-llama/Llama-3.2-1B).
12+
- Download params.json and tokenizer.model from [Llama website](https://www.llama.com/llama-downloads/) or [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-1B).
13+
- After fine-tuning the model, export the adapter_model.safetensors file locally and convert it to the adapter_model.pth format to .pte format.
14+
15+
```python
16+
python -m examples.models.llama.export_llama \
17+
--checkpoint <File name in .pth formet> \
18+
-p <params.json> \
19+
-kv \
20+
--use_sdpa_with_kv_cache \
21+
-X \
22+
-qmode 8da4w \
23+
--group_size 128 \
24+
-d fp32 \
25+
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
26+
--embedding-quantize 4,32 \
27+
--output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
28+
```
29+
30+
- Build the Llama Runner binary for [Android](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/5-run-benchmark-on-android/).
31+
- Build and Run [Android](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/6-build-android-chat-app/).
32+
- Open Android Studio and choose "Open an existing Android Studio project" to navigate to examples/demo-apps/android/LlamaDemo and Press Run (^R) to build and launch the app on your phone.
33+
- Tap the Settings widget to select a model, configure its parameters, and set any prompts.
34+
- After choosing the model, tokenizer, and model type, click "Load Model" to load it into the app and return to the main Chat activity.

0 commit comments

Comments
 (0)