Skip to content

Commit 6a8158b

Browse files
committed
add lora demo video
1 parent cd0890b commit 6a8158b

File tree

2 files changed

+37
-168
lines changed

2 files changed

+37
-168
lines changed

program-data-separation/cpp/lora_example/README.md

Lines changed: 37 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,30 @@ This directory contains the C++ code for the LoRA demo.
44

55
You'll learn how to:
66
1. Export LoRA PTE files that share a single foundation weight file.
7-
2. Load and run the LoRA PTE files, and notice that the runtime memory increases by the LoRA adapter size (small) and not the foundation weight size (large), because the foundation weights are shared.
7+
2. Load and run multiple LoRA PTE files at the same, and notice that the runtime memory increases by the LoRA adapter size (small) and not the foundation weight size (large), because the foundation weights are shared.
88

99
Note:
1010
- Weight-sharing is supported with the XNNPACK backend.
1111
- Quantization (outside of embedding quantization) is currently not supported when weight-sharing.
1212
- There are many ways to fine-tune LoRA adapters. We will go through a few examples to create a demo.
1313

14-
## Size savings.
14+
## Table of Contents
15+
- [Size Savings](#size-savings)
16+
- [Fine-tuning](#finetune-from-scratch-with-unsloth-and-llama)
17+
- [Installation](#install-executorch)
18+
- [Export models](#export-models)
19+
- [Run models](#install-runtime-dependencies)
20+
- [Demo video](#demo-video)
21+
22+
## Size savings
1523

1624
Size results will vary depending on the model and LoRA config. For this demo, we save ~5GB of disk space by storing weights in a separate, sharable file and ~5GB runtime memory by sharing weights at runtime through the XNNPACK weight cache. Detailed results are below.
1725

18-
### XNNPACK weight sharing.
26+
### XNNPACK weight sharing
1927

2028
The XNNPACK backend is a singleton. Weight sharing is implemented via the XNNPACK weight cache. At delegate init time, XNNPACK checks the weight cache for the weights it needs. If they don't exist, XNNPACK will fetch weights from the NamedDataMap (the API that exposes weights in a PTD file), pack them, store them in the weight cache and free the original. This means we won't keep around multiple copies of the same weights.
2129

22-
## [Quick Start](quick_start.md)
23-
Download pre-trained dummy adapter to export and run along with a regular Llama-3-2-1B model.
24-
25-
## Fine-tune from scratch with Unsloth and Llama-3-2-1B.
30+
## Finetune from scratch with Unsloth and Llama
2631
[Unsloth](https://unsloth.ai/) provides a [colab notebook](https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/datasets-guide#synthetic-dataset-notebook) that showcases how to generate data using the Meta Synthetic Data Kit, and then fine-tune it to create a LoRA adapter.
2732

2833
For this demo, we trained on two datasets:
@@ -48,16 +53,6 @@ The files we want are:
4853
- adapter_config.json
4954
- adapter_model.safetensors
5055

51-
## Virtual environment setup.
52-
Create and activate a Python virtual environment:
53-
```bash
54-
python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip
55-
```
56-
Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
57-
```bash
58-
conda create -yn executorch-lora python=3.10.0 && conda activate executorch-lora
59-
```
60-
6156
## Install executorch
6257
[Install from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source).
6358

@@ -66,7 +61,10 @@ conda create -yn executorch-lora python=3.10.0 && conda activate executorch-lora
6661
cd ~/executorch-examples/program-data-separation/cpp/executorch
6762
6863
# Update to recent main.
69-
git pull origin/main
64+
git pull origin main
65+
66+
git submodule sync
67+
git submodule update --init --recursive
7068
7169
# Install ExecuTorch pip package.
7270
./install_executorch.sh --editable
@@ -77,10 +75,11 @@ You can also install from a recent nightly build.
7775
pip install executorch==1.1.0.devYYYYMMDD --extra-index-url https://download.pytorch.org/whl/nightly/cpu
7876
```
7977

80-
NOTE: use main or a recent nightly, as some features are not available in executorch==1.0.0.
78+
Use main or a recent nightly, as some features are not available in executorch==1.0.0.
79+
80+
## Export models
8181

82-
## Download base model
83-
We're using https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct.
82+
1. Download the base model. We're using https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct.
8483
```
8584
pip install huggingface_hub
8685
@@ -89,16 +88,14 @@ huggingface-cli login
8988
huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --local-dir ./Llama-3.2-1B-Instruct
9089
```
9190

92-
## Export the adapter models.
93-
94-
Set your paths and the model name.
91+
2. Set your paths and the model name.
9592
```
9693
DOWNLOADED_PATH=Llama-3.2-1B-Instruct
9794
ADAPTER_PATH=lora_model
9895
MODEL_NAME=<model_name>
9996
```
10097

101-
Export command. Run this with different MODEL_NAMEs for each adapter.
98+
3. Export command. Run this with different MODEL_NAMEs for each adapter.
10299
```
103100
python -m executorch.extension.llm.export.export_llm \
104101
base.checkpoint="${DOWNLOADED_PATH}/original/consolidated.00.pth" \
@@ -115,29 +112,33 @@ python -m executorch.extension.llm.export.export_llm \
115112
export.foundation_weights_file="foundation.ptd"
116113
```
117114

118-
Expect to see two files: '<model_name>.pte' and 'foundation.ptd'. Run the command again to generate more adapter PTE files. The generated `foundation.ptd` files should all be the same (we are using the same base model) and you only need to keep one of them.
115+
Expect to see two files: '<model_name>.pte' and 'foundation.ptd'. Run the command again to generate more adapter PTE files. You only need to keep one `foundation.ptd` file.
116+
117+
You can also run `~/executorch-examples/program-data-separation/export_lora.sh`. This will export the dummy lora model and the base Llama-3-2-1B model PTE files.
119118

120119
Example files, trained on executorch/docs/source/ and recent Nobel prize winners.
121120
```bash
122121
# executorch docs trained adapter model.
123122
-rw-r--r-- 1 lfq users 45555712 Oct 17 18:05 et.pte
124123
# foundation weight file
125124
-rw-r--r-- 1 lfq users 5994013600 Oct 17 18:05 foundation.ptd
125+
# dummy lora model.
126+
-rw-r--r-- 1 lfq users 27628928 Oct 17 14:31 llama_3_2_1B_lora.pte
126127
# Nobel prize winners trained adapter model.
127128
-rw-r--r-- 1 lfq users 45555712 Oct 17 18:00 nobel.pte
128129
```
129130

130-
Notice the adapter PTE files are about the same size as the `adapter_model.safetensors` file generated during training. The PTE contains the adapter weights (which are not shared) and the program.
131+
Notice the adapter PTE files are about the same size as the `adapter_model.safetensors`/`adapter_model.pt` files generated during training. The PTE contains the adapter weights (which are not shared) and the program.
131132

132-
## Install runtime dependencies.
133+
## Install runtime dependencies
133134
The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`. To initialize it:
134135
```bash
135136
cd ~/executorch-examples/
136-
git submodule sync
137-
git submodule update --init --recursive
138137

139-
# To update to the remote main branch.
138+
# Update to the remote main branch.
140139
git submodule update --remote program-data-separation/cpp/executorch
140+
git submodule sync
141+
git submodule update --init --recursive
141142
```
142143

143144
Install dev requirements for ExecuTorch:
@@ -146,7 +147,7 @@ cd ~/executorch-examples/program-data-separation/cpp/executorch
146147
pip install -r requirements-dev.txt
147148
```
148149

149-
## Build the runtime.
150+
## Build the runtime
150151
Install some dependencies:
151152
```bash
152153
cd ~/executorch-examples/program-data-separation/cpp/executorch
@@ -159,7 +160,7 @@ cd ~/executorch-examples/program-data-separation/cpp/lora_example
159160
sh build_example.sh
160161
```
161162

162-
## Run the executable.
163+
## Run the executable
163164
```bash
164165
cd ~/executorch-examples/program-data-separation/cpp/lora_example
165166

@@ -192,7 +193,7 @@ We can see that the ExecuTorch-trained adapter model does not have knowledge of
192193

193194
There is about ~1.1GB memory increase between running the two models.
194195
Most of that (about ~1GB) comes from embeddings that are not lowered to XNNPACK (and currently are not shared). This can be alleviated by quantizing the embeddings by adding the config `quantization.embedding_quantize=\'4,32\'` to the export command.
195-
~50MB comes from the adapter model, which is also shared.
196+
~50MB comes from the adapter model, which is not shared.
196197

197198
Let's try with an executorch-specific prompt.
198199
```bash
@@ -237,3 +238,5 @@ I 00:00:50.189743 executorch:text_llm_runner.cpp:206] RSS after finishing text g
237238
```
238239

239240
The ExecuTorch-trained adapter model has domain knowledge of ExecuTorch codebase, whereas the Nobel-prize trained adapter model does not.
241+
242+
## Demo video

program-data-separation/cpp/lora_example/quick_start.md

Lines changed: 0 additions & 134 deletions
This file was deleted.

0 commit comments

Comments
 (0)