Skip to content

Commit c3e7995

Browse files
Merge pull request #1632 from dasparic/ftllm
ftllmweb
2 parents 27e4dad + 0b35132 commit c3e7995

File tree

26 files changed

+1268
-0
lines changed

26 files changed

+1268
-0
lines changed
22.6 KB
Loading
9.99 KB
Loading
3.07 KB
Loading
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
title: LLM Fine-Tuning for Mobile Applications
3+
4+
minutes_to_complete: 60
5+
6+
who_is_this_for: This learning path provides an introduction for developers and data scientists new to fine-tuning large language models (LLMs) and looking to develop a fine-tuned LLM for mobile applications. Fine-tuning involves adapting a pre-trained LLM to specific tasks or domains by training it on domain-specific data and optimizing its responses for accuracy and relevance. For mobile applications, fine-tuning enables personalized interactions, enhanced query handling, and improved contextual understanding, making AI-driven features more effective. This session will cover key concepts, techniques, tools, and best practices, ensuring a structured approach to building a fine-tuned LLM that aligns with real-world mobile application requirements.Mobile application with Llama, KleidiAI, ExecuTorch, and XNNPACK.
7+
8+
learning_objectives:
9+
- Learn the basics of large language models (LLMs) and how fine-tuning enhances model performance for specific use cases focusing on mobile applications.
10+
- Understand full fine-tuning, parameter-efficient fine-tuning (e.g., LoRA, QLoRA, PEFT), and instruction-tuning.
11+
- Learn when to use different fine-tuning approaches based on model size, task complexity, and computational constraints.
12+
- Learn how to curate, clean, and preprocess domain-specific datasets for optimal fine-tuning.
13+
- Understand dataset formats, tokenization, and annotation techniques for improving model learning.
14+
- Implementing Fine-Tuning with Popular Frameworks like Hugging Face Transformers and PyTorch for LLM fine-tuning.
15+
- Learn how to deploy and fine-tune the model in the mobile device.
16+
- Compile a Large Language Model (LLM) using ExecuTorch.
17+
- Describe techniques for running large language models in an mobile environment.
18+
19+
prerequisites:
20+
- Basic Understanding of Machine Learning & Deep Learning (Familiarity with concepts like supervised learning, neural networks, transfer learning and Understanding of model training, validation, & overfitting concepts).
21+
- Familiarity with Deep Learning Frameworks (Experience with PyTorch for building, training neural networks and Knowledge of Hugging Face Transformers for working with pre-trained LLMs.
22+
- An Arm-powered smartphone with the i8mm feature running Android, with 16GB of RAM.
23+
- A USB cable to connect your smartphone to your development machine.
24+
- An AWS Graviton4 r8g.16xlarge instance to test Arm performance optimizations, or any [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server or Arm based laptop.
25+
- Python 3.10.
26+
27+
author: Parichay Das
28+
29+
### Tags
30+
skilllevels: Introductory
31+
subjects: GenAI
32+
armips:
33+
- Neoverse
34+
35+
tools_software_languages:
36+
- LLM
37+
- GenAI
38+
- Python
39+
- PyTorch
40+
- ExecuTorch
41+
operatingsystems:
42+
- Linux
43+
- Windows
44+
- Android
45+
46+
47+
further_reading:
48+
- resource:
49+
title: Hugging Face Documentation
50+
link: https://huggingface.co/docs
51+
type: documentation
52+
- resource:
53+
title: PyTorch Documentation
54+
link: https://pytorch.org/docs/stable/index.html
55+
type: documentation
56+
- resource:
57+
title: Android
58+
link: https://www.android.com/
59+
type: website
60+
61+
62+
### FIXED, DO NOT MODIFY
63+
# ================================================================================
64+
weight: 1 # _index.md always has weight of 1 to order correctly
65+
layout: "learningpathall" # All files under learning paths have this same wrapper
66+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
67+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
61.7 KB
Loading
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
title: Overview
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## What is Fine-Tuning
10+
Fine-tuning in the context of large language models (LLMs) refers to the process of further training a pre-trained LLM on domain-specific or task-specific data to enhance its performance for a particular application. LLMs, such as GPT, BERT, and LLaMA, are initially trained on massive corpora containing billions of tokens, enabling them to develop a broad linguistic understanding. Fine-tuning refines this knowledge by exposing the model to specialized datasets, allowing it to generate more contextually relevant and accurate responses. Rather than training an LLM from scratch, fine-tuning leverages the pre-existing knowledge embedded in the model, optimizing it for specific use cases such as customer support, content generation, legal document analysis, or medical text processing. This approach significantly reduces computational requirements and data needs while improving adaptability and efficiency in real-world applications.
11+
12+
## Advantage of Fine-Tuning
13+
Fine-tuning is essential for optimizing large language models (LLMs) to meet specific application requirements, enhance performance, and reduce computational costs. While pre-trained LLMs have broad linguistic capabilities, they may not always produce domain-specific, contextually accurate, or application-tailored responses
14+
- Customization for Specific Domains
15+
- Improved Response Quality and Accuracy
16+
- Task-Specific Adaptation
17+
- Reduction in Computational and Data Requirements
18+
- Enhanced Efficiency in Real-World Applications
19+
- Alignment with Ethical, Regulatory, and Organizational Guidelines
20+
21+
## Fine-Tuning Methods
22+
Fine-tuning LLM uses different techniques based on the various use cases, computational constraints, and efficiency requirements. Below are the key fine-tuning methods:
23+
24+
### Full Fine-Tuning (Supervised Learning Approach)
25+
It involves updating all parameters of the LLM using task-specific data, requiring significant computational power and large labeled datasets, which provides the highest level of customization.
26+
27+
### Instruction Fine-Tuning
28+
Instruction fine-tuning is a supervised learning method. A pre-trained large language model (LLM) is further trained on instruction-response pairs to improve its ability to follow human instructions accurately. Instruction Fine-Tuning has some key features using Labeled Instruction-Response Pairs, Enhances Model Alignment with Human Intent, Commonly Used in Chatbots and AI Assistants, and Prepares Models for Zero-Shot and Few-Shot Learning.
29+
30+
### Parameter-Efficient Fine-Tuning (PEFT)
31+
It is a optimized approaches that reduce the number of trainable parameters while maintaining high performance:
32+
33+
- ###### LoRA (Low-Rank Adaptation)
34+
- Introduces small trainable weight matrices (rank decomposition) while freezing the main model weights.
35+
- It will significantly reduce GPU memory usage and training time.
36+
37+
- ###### QLoRA (Quantized LoRA)
38+
- It will use quantization (e.g., 4-bit or 8-bit precision) to reduce memory footprint while applying LoRA fine-tuning.
39+
- It is Ideal for fine-tuning large models on limited hardware.
40+
41+
- ###### Adapter Layers
42+
- Inserts small trainable layers between existing layers of the model and Keeps most parameters frozen, reducing computational overhead.
43+
44+
- ###### Reinforcement Learning from Human Feedback (RLHF)
45+
- Fine-tunes models based on human preferences using reinforcement learning.
46+
47+
- ###### Domain-Specific Fine-Tuning
48+
- Fine-tunes the LLM with domain-specific datasets and Improves accuracy and relevance in specialized applications.
49+
50+
- ###### Multi-Task Learning (MTL) Fine-Tuning
51+
- Trains the model on multiple tasks simultaneously, enabling generalization across different applications.
52+
53+
54+
55+
## Fine-Tuning Implementaion
56+
The following steps need to be performed to implement fine-tuning:
57+
58+
59+
![example image alt-text#center](1.png "Figure 1. Fine-Tuning Implementaion")
60+
61+
- Base Model Selection: Choose a pre-trained model based on your use cases. You can find pre-trained models at [Hugging Face](https://huggingface.co/).
62+
- Fine-Tuning Method Finalization: Select the most appropriate fine-tuning method (e.g., supervised, instruction-based, PEFT) based on your use case and dataset. You can typically find various datasets on [Hugging Face](https://huggingface.co/datasets) and [Kaggle](https://www.kaggle.com/datasets).
63+
- Dataset Prepration:Organize your data for your use case-specific training, ensuring it aligns with the model's required format.
64+
- Training:Utilize frameworks such as TensorFlow and PyTorch to fine-tune the model.
65+
- Evaluate: Evaluate the model, refine it as needed, and retrain to enhance performance.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: Fine Tuning Large Language Model - Setup Environment
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Fine Tuning Large Language Model - Setup Environment
10+
11+
#### Plartform Required
12+
- An AWS Graviton4 r8g.16xlarge instance to test Arm performance optimizations, or any [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server or Arm based laptop.
13+
- An Arm-powered smartphone with the i8mm feature running Android, with 16GB of RAM.
14+
- A USB cable to connect your smartphone to your development machine.
15+
16+
#### Set Up Required Libraries
17+
The following commands install the necessary libraries for the task, including Hugging Face Transformers, Datasets, and fine-tuning methods. These libraries facilitate model loading, training, and fine-tuning
18+
19+
###### The transformers library (by Hugging Face) provides pre-trained LLMs
20+
```python
21+
!pip install transformers
22+
23+
```
24+
###### This installs transformers along with PyTorch, ensuring that models are trained and fine-tuned using the Torch backend.
25+
```python
26+
!pip install transformers[torch]
27+
```
28+
###### The datasets library (by Hugging Face) provides access to a vast collection of pre-built datasets
29+
30+
```python
31+
!pip install datasets
32+
```
33+
###### The evaluate library provides metrics for model performance assessment
34+
35+
```python
36+
!pip install evaluate
37+
```
38+
###### Speed up fine-tuning of Large Language Models (LLMs)
39+
[Unsloth](https://huggingface.co/unsloth) is a library designed to speed up fine-tuning of Large Language Models (LLMs) while reducing computational costs. It optimizes training efficiency, particularly for LoRA (Low-Rank Adaptation) fine-tuning
40+
```python
41+
%%capture
42+
# %%capture is a Jupyter Notebook magic command that suppresses the output of a cell.
43+
44+
```
45+
##### Uninstalls the existing Unsloth installation and installs the latest version directly from the GitHub repository
46+
47+
```python
48+
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
49+
```
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
title: Fine Tuning Large Language Model - Load Pre-trained Model & Tokenizer
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Fine Tuning Large Language Model - Load Pre-trained Model & Tokenizer
10+
11+
#### Load Pre-trained Model & Tokenizer
12+
The following commands Load the pre-trained model and tokenizer, ensuring compatibility with the fine-tuning task and optimizing memory usage
13+
14+
###### Import Required Modules
15+
- FastLanguageModel: A highly optimized loader for LLaMA models in Unsloth, making it faster and memory-efficient.
16+
- torch: Required for handling tensors and computations.
17+
```python
18+
from unsloth import FastLanguageModel
19+
import torch
20+
21+
```
22+
###### Define Model Configuration
23+
- max_seq_length = 2048 → Defines the maximum number of tokens the model can process at once.
24+
- dtype = None → Auto-selects Float16 for older GPUs (Tesla T4, V100)
25+
- load_in_4bit = True → Enables 4-bit quantization to reduce memory usage
26+
```python
27+
max_seq_length = 2048
28+
dtype = None
29+
load_in_4bit = True
30+
```
31+
###### Load the Pre-trained Model
32+
- Loads a 1B parameter fine-tuned LLaMA model
33+
- Loads the optimized LLaMA model with reduced VRAM usage and faster processing
34+
- Loads the corresponding tokenizer for tokenizing inputs properly
35+
36+
```python
37+
model, tokenizer = FastLanguageModel.from_pretrained(
38+
model_name = "unsloth/Llama-3.2-1B-Instruct",
39+
max_seq_length = max_seq_length,
40+
dtype = dtype,
41+
load_in_4bit = load_in_4bit,
42+
```
43+
###### Parameter-Efficient Fine-Tuning (PEFT) using LoRA (Low-Rank Adaptation) for the pre-trained model
44+
- LoRA Rank (r): Defines the rank of the low-rank matrices used in LoRA
45+
- Target Modules: Specifies which layers should be fine-tuned with LoRA, Includes attention layers (q_proj, k_proj, v_proj, o_proj) and feedforward layers (gate_proj, up_proj, down_proj)
46+
- LoRA Alpha (lora_alpha):Scaling factor for LoRA weights and A higher value makes the LoRA layers contribute more to the model's output
47+
- LoRA Dropout: Dropout randomly disables connections to prevent overfitting
48+
- Bias (bias): No additional bias parameters are trained (optimized for efficiency)
49+
- Gradient Checkpointing: Optimized memory-saving method
50+
- Random Seed: Ensures reproducibility across training runs
51+
- Rank-Stabilized LoRA: Rank stabilization not used
52+
- LoFTQ Quantization: No LoFTQ (Low-bit Quantization) applied
53+
```python
54+
model = FastLanguageModel.get_peft_model(
55+
model,
56+
r = 16,
57+
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
58+
"gate_proj", "up_proj", "down_proj",],
59+
lora_alpha = 16,
60+
lora_dropout = 0,
61+
bias = "none",
62+
use_gradient_checkpointing = "unsloth",
63+
random_state = 3407,
64+
use_rslora = False,
65+
loftq_config = None,
66+
)
67+
```
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
title: Fine Tuning Large Language Model - Prepare Dataset
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Fine Tuning Large Language Model - Prepare Dataset
10+
This step prepares the dataset for fine-tuning by formatting it to match the LLaMA-3.1 chat template.
11+
12+
###### Import Chat Template for Tokenizer
13+
This imports the chat template functionality from Unsloth and It allows us to structure the dataset in a format that LLaMA-3.1 expects
14+
```python
15+
from unsloth.chat_templates import get_chat_template
16+
```
17+
18+
###### Apply the Chat Template to Tokenizer
19+
- Apply the Chat Template to Tokenizer.
20+
- Ensures prompt formatting is consistent when training the model.
21+
```python
22+
tokenizer = get_chat_template(
23+
tokenizer,
24+
chat_template = "llama-3.1",
25+
)
26+
27+
28+
```
29+
###### Format Dataset Prompts
30+
- Extracts the instruction column from the dataset.
31+
- Applies the chat template formatting to each instruction.
32+
- Returns a new dictionary with the formatted text.
33+
```python
34+
def formatting_prompts_func(examples):
35+
convos = examples["instruction"]
36+
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
37+
return { "text" : texts, }
38+
pass
39+
```
40+
###### Load the Dataset
41+
- Loads a [customer support chatbot training dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset) from Hugging Face
42+
- The dataset contains example conversations with instructions for fine-tuning
43+
- Loads the corresponding tokenizer for tokenizing inputs properly
44+
45+
```python
46+
from datasets import load_dataset
47+
dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset", split = "train")
48+
49+
```
50+
![example image alt-text#center](2.png )
51+
52+
###### Import Standardization Function
53+
- Imports standardize_sharegpt, a function that helps in structuring dataset inputs in a ShareGPT-like format (a commonly used format for LLM fine-tuning).
54+
- Ensures that data follows a standardized format required for effective instruction tuning.
55+
```python
56+
from unsloth.chat_templates import standardize_sharegpt
57+
```
58+
###### Define a Function to Format Dataset
59+
- Extracts the instruction (input text) and response (output text) from the dataset.
60+
- Stores them as "instruction_text" and "response_text".
61+
```python
62+
def formatting_prompts_func(examples):
63+
return { "instruction_text": examples["instruction"], "response_text": examples["response"] }
64+
65+
```
66+
67+
###### Apply Formatting to Dataset
68+
- Applies formatting_prompts_func to every record in the dataset.
69+
- Uses batch processing (batched=True) for efficiency.
70+
```python
71+
def formatting_prompts_func(examples):
72+
return { "instruction_text": examples["instruction"], "response_text": examples["response"] }
73+
74+
```
75+
![example image alt-text#center](3.png )

0 commit comments

Comments
 (0)