ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/1.png‎
22.6 KB b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/1.png‎
22.6 KB
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/2.png‎
9.99 KB b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/2.png‎
9.99 KB
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/3.png‎
3.07 KB b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/3.png‎
3.07 KB
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/_index.md‎
Lines changed: 67 additions & 0 deletions b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/_index.md‎
Lines changed: 67 additions & 0 deletions
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/_next-steps.md‎
Lines changed: 8 additions & 0 deletions b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/_next-steps.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/example-picture.png‎
61.7 KB b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/example-picture.png‎
61.7 KB
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/how-to-1.md‎
Lines changed: 65 additions & 0 deletions b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/how-to-1.md‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/how-to-2.md‎
Lines changed: 49 additions & 0 deletions b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/how-to-2.md‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/how-to-3.md‎
Lines changed: 67 additions & 0 deletions b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/how-to-3.md‎
Lines changed: 67 additions & 0 deletions
diff --git a/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/how-to-4.md‎
Lines changed: 75 additions & 0 deletions b/‎content/learning-paths/embedded-and-microcontrollers/llm-fine-tuning-for-mobile-applications/how-to-4.md‎
Lines changed: 75 additions & 0 deletions
@@ -0,0 +1,67 @@
+---
+title: LLM Fine-Tuning for Mobile Applications
+
+minutes_to_complete: 60
+
+who_is_this_for: This learning path provides an introduction for developers and data scientists new to fine-tuning large language models (LLMs) and looking to develop a fine-tuned LLM for mobile applications. Fine-tuning involves adapting a pre-trained LLM to specific tasks or domains by training it on domain-specific data and optimizing its responses for accuracy and relevance. For mobile applications, fine-tuning enables personalized interactions, enhanced query handling, and improved contextual understanding, making AI-driven features more effective. This session will cover key concepts, techniques, tools, and best practices, ensuring a structured approach to building a fine-tuned LLM that aligns with real-world mobile application requirements.Mobile application with Llama, KleidiAI, ExecuTorch, and XNNPACK.
+
+learning_objectives: 
+    - Learn the basics of large language models (LLMs) and how fine-tuning enhances model performance for specific use cases focusing on mobile applications. 
+    - Understand full fine-tuning, parameter-efficient fine-tuning (e.g., LoRA, QLoRA, PEFT), and instruction-tuning.
+    - Learn when to use different fine-tuning approaches based on model size, task complexity, and computational constraints.
+    - Learn how to curate, clean, and preprocess domain-specific datasets for optimal fine-tuning.
+    - Understand dataset formats, tokenization, and annotation techniques for improving model learning.
+    - Implementing Fine-Tuning with Popular Frameworks like Hugging Face Transformers and PyTorch for LLM fine-tuning.
+    - Learn how to deploy and fine-tune the model in the mobile device.
+    - Compile a Large Language Model (LLM) using ExecuTorch.
+    - Describe techniques for running large language models in an mobile environment.
+
+prerequisites:
+    - Basic Understanding of Machine Learning & Deep Learning (Familiarity with concepts like supervised learning, neural networks, transfer learning and Understanding of model training, validation, & overfitting concepts).
+    - Familiarity with Deep Learning Frameworks (Experience with PyTorch for building, training neural networks and Knowledge of Hugging Face Transformers for working with pre-trained LLMs.
+    - An Arm-powered smartphone with the i8mm feature running Android, with 16GB of RAM.
+    - A USB cable to connect your smartphone to your development machine.
+    - An AWS Graviton4 r8g.16xlarge instance to test Arm performance optimizations, or any [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server or Arm based laptop.
+    - Python 3.10.
+
+author: Parichay Das
+
+### Tags
+skilllevels: Introductory
+subjects: GenAI
+armips:
+    - Neoverse
+
+tools_software_languages:
+    - LLM
+    - GenAI
+    - Python
+    - PyTorch
+    - ExecuTorch
+operatingsystems:
+    - Linux
+    - Windows
+    - Android  
+
+
+further_reading:
+     - resource:
+        title: Hugging Face Documentation
+        link: https://huggingface.co/docs
+        type: documentation
+     - resource:
+        title: PyTorch Documentation
+        link: https://pytorch.org/docs/stable/index.html
+        type: documentation
+     - resource:
+        title: Android 
+        link: https://www.android.com/
+        type: website
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
@@ -0,0 +1,65 @@
+---
+title: Overview
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## What is Fine-Tuning
+Fine-tuning in the context of large language models (LLMs) refers to the process of further training a pre-trained LLM on domain-specific or task-specific data to enhance its performance for a particular application. LLMs, such as GPT, BERT, and LLaMA, are initially trained on massive corpora containing billions of tokens, enabling them to develop a broad linguistic understanding. Fine-tuning refines this knowledge by exposing the model to specialized datasets, allowing it to generate more contextually relevant and accurate responses. Rather than training an LLM from scratch, fine-tuning leverages the pre-existing knowledge embedded in the model, optimizing it for specific use cases such as customer support, content generation, legal document analysis, or medical text processing. This approach significantly reduces computational requirements and data needs while improving adaptability and efficiency in real-world applications. 
+
+## Advantage of Fine-Tuning
+Fine-tuning is essential for optimizing large language models (LLMs) to meet specific application requirements, enhance performance, and reduce computational costs. While pre-trained LLMs have broad linguistic capabilities, they may not always produce domain-specific, contextually accurate, or application-tailored responses
+- Customization for Specific Domains
+- Improved Response Quality and Accuracy
+- Task-Specific Adaptation
+- Reduction in Computational and Data Requirements
+- Enhanced Efficiency in Real-World Applications
+- Alignment with Ethical, Regulatory, and Organizational Guidelines
+
+## Fine-Tuning Methods
+Fine-tuning LLM uses different techniques based on the various use cases, computational constraints, and efficiency requirements. Below are the key fine-tuning methods:
+
+### Full Fine-Tuning (Supervised Learning Approach)
+It involves updating all parameters of the LLM using task-specific data, requiring significant computational power and large labeled datasets, which provides the highest level of customization.
+
+### Instruction Fine-Tuning
+Instruction fine-tuning is a supervised learning method. A pre-trained large language model (LLM) is further trained on instruction-response pairs to improve its ability to follow human instructions accurately. Instruction Fine-Tuning has some key features using Labeled Instruction-Response Pairs, Enhances Model Alignment with Human Intent, Commonly Used in Chatbots and AI Assistants, and Prepares Models for Zero-Shot and Few-Shot Learning.
+
+### Parameter-Efficient Fine-Tuning (PEFT)
+It is a optimized approaches that reduce the number of trainable parameters while maintaining high performance:
+
+- ###### LoRA (Low-Rank Adaptation)
+    - Introduces small trainable weight matrices (rank decomposition) while freezing the main model weights.
+    - It will significantly reduce GPU memory usage and training time.
+
+- ###### QLoRA (Quantized LoRA)
+    - It will use quantization (e.g., 4-bit or 8-bit precision) to reduce memory footprint while applying LoRA fine-tuning.
+    - It is Ideal for fine-tuning large models on limited hardware.
+
+- ###### Adapter Layers
+    - Inserts small trainable layers between existing layers of the model and Keeps most parameters frozen, reducing computational overhead.
+
+- ###### Reinforcement Learning from Human Feedback (RLHF)
+    - Fine-tunes models based on human preferences using reinforcement learning.
+
+- ###### Domain-Specific Fine-Tuning
+    - Fine-tunes the LLM with domain-specific datasets and Improves accuracy and relevance in specialized applications.
+
+- ###### Multi-Task Learning (MTL) Fine-Tuning
+    - Trains the model on multiple tasks simultaneously, enabling generalization across different applications.
+
+
+
+## Fine-Tuning Implementaion 
+The following steps need to be performed to implement fine-tuning:
+
+
+![example image alt-text#center](1.png "Figure 1. Fine-Tuning Implementaion")
+
+-   Base Model Selection: Choose a pre-trained model based on your use cases. You can find pre-trained models at [Hugging Face](https://huggingface.co/).
+-   Fine-Tuning Method Finalization: Select the most appropriate fine-tuning method (e.g., supervised, instruction-based, PEFT) based on your use case and dataset. You can typically find various datasets on [Hugging Face](https://huggingface.co/datasets) and [Kaggle](https://www.kaggle.com/datasets).
+-   Dataset Prepration:Organize your data for your use case-specific training, ensuring it aligns with the model's required format.
+-   Training:Utilize frameworks such as TensorFlow and PyTorch to fine-tune the model.
+-   Evaluate: Evaluate the model, refine it as needed, and retrain to enhance performance.
@@ -0,0 +1,49 @@
+---
+title: Fine Tuning Large Language Model - Setup Environment 
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Fine Tuning Large Language Model - Setup Environment
+
+#### Plartform Required 
+- An AWS Graviton4 r8g.16xlarge instance to test Arm performance optimizations, or any [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server or Arm based laptop.
+- An Arm-powered smartphone with the i8mm feature running Android, with 16GB of RAM.
+- A USB cable to connect your smartphone to your development machine.
+
+#### Set Up Required Libraries
+The following commands install the necessary libraries for the task, including Hugging Face Transformers, Datasets, and fine-tuning methods. These libraries facilitate model loading, training, and fine-tuning
+
+###### The transformers library (by Hugging Face) provides pre-trained LLMs
+```python
+!pip install transformers
+
+```
+###### This installs transformers along with PyTorch, ensuring that models are trained and fine-tuned using the Torch backend.
+```python
+!pip install transformers[torch]
+```
+###### The datasets library (by Hugging Face) provides access to a vast collection of pre-built datasets
+
+```python
+!pip install datasets
+```
+###### The evaluate library provides metrics for model performance assessment
+
+```python
+!pip install evaluate
+```
+###### Speed up fine-tuning of Large Language Models (LLMs)
+[Unsloth](https://huggingface.co/unsloth) is a library designed to speed up fine-tuning of Large Language Models (LLMs) while reducing computational costs. It optimizes training efficiency, particularly for LoRA (Low-Rank Adaptation) fine-tuning 
+```python
+%%capture
+# %%capture is a Jupyter Notebook magic command that suppresses the output of a cell.
+
+```
+##### Uninstalls the existing Unsloth installation and installs the latest version directly from the GitHub repository
+
+```python
+!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
+```
@@ -0,0 +1,67 @@
+---
+title: Fine Tuning Large Language Model - Load Pre-trained Model & Tokenizer
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Fine Tuning Large Language Model - Load Pre-trained Model & Tokenizer
+
+#### Load Pre-trained Model & Tokenizer
+The following commands Load the pre-trained model and tokenizer, ensuring compatibility with the fine-tuning task and optimizing memory usage
+
+###### Import Required Modules
+- FastLanguageModel: A highly optimized loader for LLaMA models in Unsloth, making it faster and memory-efficient.
+- torch: Required for handling tensors and computations.
+```python
+from unsloth import FastLanguageModel
+import torch
+
+```
+###### Define Model Configuration
+- max_seq_length = 2048 → Defines the maximum number of tokens the model can process at once.
+- dtype = None → Auto-selects Float16 for older GPUs (Tesla T4, V100)
+- load_in_4bit = True → Enables 4-bit quantization to reduce memory usage
+```python
+max_seq_length = 2048  
+dtype = None          
+load_in_4bit = True
+```
+###### Load the Pre-trained Model
+- Loads a 1B parameter fine-tuned LLaMA model
+- Loads the optimized LLaMA model with reduced VRAM usage and faster processing
+- Loads the corresponding tokenizer for tokenizing inputs properly
+
+```python
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name = "unsloth/Llama-3.2-1B-Instruct", 
+    max_seq_length = max_seq_length,
+    dtype = dtype,
+    load_in_4bit = load_in_4bit,
+```
+###### Parameter-Efficient Fine-Tuning (PEFT) using LoRA (Low-Rank Adaptation) for the pre-trained model
+- LoRA Rank (r): Defines the rank of the low-rank matrices used in LoRA
+- Target Modules: Specifies which layers should be fine-tuned with LoRA, Includes attention layers (q_proj, k_proj, v_proj, o_proj) and feedforward layers (gate_proj, up_proj, down_proj)
+- LoRA Alpha (lora_alpha):Scaling factor for LoRA weights and A higher value makes the LoRA layers contribute more to the model's output
+- LoRA Dropout: Dropout randomly disables connections to prevent overfitting
+- Bias (bias): No additional bias parameters are trained (optimized for efficiency)
+- Gradient Checkpointing: Optimized memory-saving method
+- Random Seed: Ensures reproducibility across training runs
+- Rank-Stabilized LoRA: Rank stabilization not used
+- LoFTQ Quantization: No LoFTQ (Low-bit Quantization) applied
+```python
+model = FastLanguageModel.get_peft_model(
+    model,
+    r = 16, 
+    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
+                      "gate_proj", "up_proj", "down_proj",],
+    lora_alpha = 16,
+    lora_dropout = 0, 
+    bias = "none",    
+    use_gradient_checkpointing = "unsloth", 
+    random_state = 3407,
+    use_rslora = False,  
+    loftq_config = None, 
+)
+```
@@ -0,0 +1,75 @@
+---
+title: Fine Tuning Large Language Model - Prepare Dataset
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Fine Tuning Large Language Model - Prepare Dataset
+This step prepares the dataset for fine-tuning by formatting it to match the LLaMA-3.1 chat template.
+
+###### Import Chat Template for Tokenizer
+This imports the chat template functionality from Unsloth and It allows us to structure the dataset in a format that LLaMA-3.1 expects
+```python
+from unsloth.chat_templates import get_chat_template
+```
+
+###### Apply the Chat Template to Tokenizer
+- Apply the Chat Template to Tokenizer.
+- Ensures prompt formatting is consistent when training the model.
+```python
+tokenizer = get_chat_template(
+    tokenizer,
+    chat_template = "llama-3.1",
+)
+
+
+```
+###### Format Dataset Prompts
+- Extracts the instruction column from the dataset.
+- Applies the chat template formatting to each instruction.
+- Returns a new dictionary with the formatted text.
+```python
+def formatting_prompts_func(examples):
+    convos = examples["instruction"]
+    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
+    return { "text" : texts, }
+pass
+```
+###### Load the Dataset
+- Loads a [customer support chatbot training dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset) from Hugging Face
+- The dataset contains example conversations with instructions for fine-tuning
+- Loads the corresponding tokenizer for tokenizing inputs properly
+
+```python
+from datasets import load_dataset
+dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset", split = "train")
+
+```
+![example image alt-text#center](2.png )
+
+###### Import Standardization Function
+- Imports standardize_sharegpt, a function that helps in structuring dataset inputs in a ShareGPT-like format (a commonly used format for LLM fine-tuning).
+- Ensures that data follows a standardized format required for effective instruction tuning.
+```python
+from unsloth.chat_templates import standardize_sharegpt
+```
+###### Define a Function to Format Dataset
+- Extracts the instruction (input text) and response (output text) from the dataset.
+- Stores them as "instruction_text" and "response_text".
+```python
+def formatting_prompts_func(examples):
+    return { "instruction_text": examples["instruction"], "response_text": examples["response"] }
+
+```
+
+###### Apply Formatting to Dataset
+- Applies formatting_prompts_func to every record in the dataset.
+- Uses batch processing (batched=True) for efficiency.
+```python
+def formatting_prompts_func(examples):
+    return { "instruction_text": examples["instruction"], "response_text": examples["response"] }
+
+```
+![example image alt-text#center](3.png )