Merge pull request #1743 from jasonrandrews/review

jasonrandrews · web-flow · commit fd8c2b31a499 · 2025-03-25T09:28:11.000-05:00
Merge 2 fine tuning Learning Paths and keep them in draft until techn…
diff --git a/content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/_index.md b/content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/_index.md
@@ -1,13 +1,13 @@
 ---
-title: LLM fine-tuning for web applications
+title: LLM fine-tuning for web and mobile applications
 
 draft: true
 cascade:
     draft: true
 
 minutes_to_complete: 60
 
-who_is_this_for: This is an introductory topic for developers and data scientists new to fine-tuning large language models (LLMs) and looking to develop a fine-tuned LLM for web applications. 
+who_is_this_for: This is an introductory topic for developers and data scientists new to fine-tuning large language models (LLMs) and looking to develop a fine-tuned LLM for web and mobile applications. 
 
 learning_objectives: 
     - Learn the basics of large language models (LLMs) and how fine-tuning enhances model performance for specific use cases.
@@ -16,9 +16,13 @@ learning_objectives:
     - Learn how to curate, clean, and preprocess domain-specific datasets for optimal fine-tuning.
     - Understand dataset formats, tokenization, and annotation techniques for improving model learning.
     - Implement fine-tuning with frameworks like Hugging Face Transformers and PyTorch.
+    - Compile a Large Language Model (LLM) using ExecuTorch.
+    - Learn how to deploy a fine-tuned model on a mobile device.
+    - Describe techniques for running large language models in an mobile environment.
 
 prerequisites:
     - An AWS Graviton4 instance. You can substitute any Arm based Linux computer. Refer to [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/) for more information about cloud service providers offering Arm-based instances. 
+    - An Android smartphone with the i8mm feature and 16GB of RAM.
     - Basic understanding of machine learning and deep learning. 
     - Familiarity with deep learning frameworks such as PyTorch and Hugging Face Transformers. 
 
diff --git a/content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/how-to-7.md b/content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/how-to-7.md
@@ -0,0 +1,109 @@
+---
+title: Mobile Plartform for Fine Tuning Large Language Model 
+weight: 9 
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+##  Development environment
+You will learn to build the ExecuTorch runtime for fine-tuning models using KleidiAI, create JNI libraries for an mobile application, and integrate these libraries into the application.
+
+The first step is to set up a development environment with the necessary software:
+- Python 3.10 or later
+- Git
+- Java 17 JDK
+- Latest Version of Android Studio
+- Android NDK
+
+###### Installation of Android Studio and Android NDK
+- Download and install the latest version of Android Studio
+- Launch Android Studio and open the Settings dialog.
+- Go to Languages & Frameworks > Android SDK.
+- In the SDK Platforms tab, select Android 14.0 ("UpsideDownCake").
+- Install the required version of Android NDK by first setting up the Android command line tools.
+
+###### Install Java 17 JDK  
+- Open the [Java SE 17 Archive Downloads](https://www.oracle.com/java/technologies/javase/jdk17-archive-downloads.html) Downloads page in your browser.  
+- Choose the appropriate version for your operating system.  
+- Downloads are available for macOS and Linux.
+
+###### Install Git and cmake
+
+For macOS use [Homebrew](https://brew.sh/):
+
+``` bash
+brew install git cmake
+```
+
+For Linux, use the package manager for your distribution:
+
+``` bash
+sudo apt install git-all cmake
+```
+
+###### Install Python 3.10
+
+For macOS:
+
+``` bash
+brew install python@3.10
+```
+
+For Linux:
+
+``` bash
+sudo apt update
+sudo apt install software-properties-common -y
+sudo add-apt-repository ppa:deadsnakes/ppa
+sudo apt install Python3.10 python3.10-venv
+```
+
+
+###### Setup the [Executorch](https://pytorch.org/executorch/stable/intro-overview.html) Environments  
+For mobile device execution, [ExecuTorch](https://pytorch.org/executorch/stable/intro-overview.html) is required. It enables efficient on-device model deployment and execution
+
+- Python virtual environment creation 
+
+```bash
+python3.10 -m venv executorch
+source executorch/bin/activate
+```
+
+The prompt of your terminal has `executorch` as a prefix to indicate the virtual environment is active.
+
+- Conda virtual environment creation 
+
+Install Miniconda on your development machine by following the [Installing conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) instructions.
+
+Once `conda` is installed, create the environment:
+
+```bash
+conda create -yn executorch python=3.10.0
+conda activate executorch
+```
+
+###### Clone ExecuTorch and install the required dependencies
+
+From within the conda environment, run the commands below to download the ExecuTorch repository and install the required packages:
+
+- You need to download Executorch from this [GitHub repository](https://github.com/pytorch/executorch/tree/main)
+- Download the executorch.aar file from [executorch.aar](https://ossci-android.s3.us-west-1.amazonaws.com/executorch/release/executorch-241002/executorch.aar )
+- Add a libs folder in this path \executorch-main\executorch-main\examples\demo-apps\android\LlamaDemo\app\libs and add executorch.aar
+
+``` bash
+git submodule sync
+git submodule update --init
+./install_requirements.sh
+./install_requirements.sh --pybind xnnpack
+./examples/models/llama/install_requirements.sh
+```
+
+###### Mobile Device Setup
+- Enable the mobile device in [Android Studio](https://support.google.com/android/community-guide/273205728/how-to-enable-developer-options-on-android-pixels-6-secret-android-tips?hl=en)
+- On the Android phone, enable Developer Options
+    - First, navigate to Settings > About Phone.
+    - At the bottom, locate Build Number and tap it seven times. A message will appear confirming that you are now a developer.(if only it were that easy to become one XD)
+    - Access Developer Options by navigating to Settings > System > Developer Options.
+    - You will see a large number of options, I repeat: DO NOT TOUCH ANYTHING YOU DO NOT KNOW.
+    - Enable USB Debugging to connect your mobile device to Android Studio.
diff --git a/content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/how-to-8.md b/content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/how-to-8.md
@@ -0,0 +1,53 @@
+---
+title: Fine Tune Large Language Model and Quantization
+weight: 10
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+####  Llama Model 
+Llama is a family of large language models designed for high-performance language processing tasks, trained using publicly available data. When fine-tuned, Llama-based models can be optimized for specific applications, enhancing their ability to generate accurate and context-aware responses. Fine-tuning enables the model to adapt to domain-specific data, improving performance in tasks such as:
+
+-   Language translation – Enhancing fluency and contextual accuracy.
+-   Question answering – Providing precise and relevant responses.
+-   Text summarization – Extracting key insights while maintaining coherence.
+
+Fine-tuned LLaMA models are also highly effective in generating human-like text, making them valuable for:
+
+-   Chatbots – Enabling intelligent and context-aware interactions.
+-   Virtual assistants – Enhancing responsiveness and personalization.
+-   Creative writing – Generating compelling and structured narratives.
+
+By fine-tuning Llama based models, their adaptability and relevance can be significantly improved, allowing seamless integration into specialized AI applications.Please note that the models are subject to the [acceptable use policy](https://github.com/facebookresearch/llama/blob/main/USE_POLICY.md) and [this responsible use guide](https://ai.meta.com/static-resource/responsible-use-guide/).
+
+#### Results
+
+Since LLaMA 2 and LLaMA 3 models require at least 4-bit quantization to accommodate the memory constraints of certain smartphones
+
+#### Quantization
+
+To optimize models for smartphone memory constraints, 4-bit groupwise per-token dynamic quantization can be applied to all linear layers. In this approach:
+
+-   Dynamic quantization is used for activations, where quantization parameters are computed at runtime based on the min/max range.
+-   Static quantization is applied to weights, which are per-channel groupwise quantized using 4-bit signed integers.
+
+This method ensures efficient memory usage while maintaining model performance on resource-constrained devices.
+
+For further information, refer to [torchao: PyTorch Architecture Optimization](https://github.com/pytorch-labs/ao/).
+
+The table below evaluates WikiText perplexity using [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness).
+
+The results are for two different groupsizes, with max_seq_len 2048, and 1000 samples:
+
+|Model | Baseline (FP32) | Groupwise 4-bit (128) | Groupwise 4-bit (256)
+|--------|-----------------| ---------------------- | ---------------
+|Llama 2 7B | 9.2 | 10.2 | 10.7
+|Llama 3 8B | 7.9 | 9.4 | 9.7
+
+Note that groupsize less than 128 was not enabled in this example, since the model was still too large. This is because current efforts have focused on enabling FP32, and support for FP16 is under way.
+
+What this implies for model size is:
+
+1. Embedding table is in FP32.
+2. Quantized weights scales are FP32.
diff --git a/content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/how-to-9.md b/content/learning-paths/servers-and-cloud-computing/llm-fine-tuning-for-web-applications/how-to-9.md
@@ -0,0 +1,34 @@
+---
+title: Prepared the Fine Tune Large Language Model for ExecuTorch and Mobile Deployment 
+weight: 11 
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+####  Fine Tune Model Preparation
+
+- From the [Huggingface](https://huggingface.co/) need to apply for Repo access [Meta's Llama 3.2 language models](https://huggingface.co/meta-llama/Llama-3.2-1B).
+-   Download params.json and tokenizer.model from [Llama website](https://www.llama.com/llama-downloads/) or [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-1B). 
+-   After fine-tuning the model, export the adapter_model.safetensors file locally and convert it to the adapter_model.pth format to .pte format.
+
+```python
+	python -m examples.models.llama.export_llama \
+    --checkpoint <File name in .pth formet> \
+	-p <params.json> \
+	-kv \
+	--use_sdpa_with_kv_cache \
+	-X \
+	-qmode 8da4w \
+	--group_size 128 \
+	-d fp32 \
+	--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
+	--embedding-quantize 4,32 \
+	--output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
+```
+
+-	Build the Llama Runner binary for [Android](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/5-run-benchmark-on-android/).
+-	Build and Run [Android](https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/build-llama3-chat-android-app-using-executorch-and-xnnpack/6-build-android-chat-app/).
+-	Open Android Studio and choose "Open an existing Android Studio project" to navigate to examples/demo-apps/android/LlamaDemo and Press Run (^R) to build and launch the app on your phone.
+-	Tap the Settings widget to select a model, configure its parameters, and set any prompts.
+-	After choosing the model, tokenizer, and model type, click "Load Model" to load it into the app and return to the main Chat activity.