You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This directory provides an example of the Program Data Separation APIs in ExecuTorch.
3
+
This directory provides an example of the Program Data Separation APIs in ExecuTorch. Specifically, it showcases:
4
+
1. Simple program data separation examples using the portable operators and XNNPACK.
5
+
2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights.
6
+
7
+
## Program Data Separation
4
8
5
9
The program-data separation APIs allow users to generate a separate data file when exporting and lowering a model. i.e., generate a PTE file containing the model execution program, and one (or more) [PTD](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) file/s containing only weights.
6
10
@@ -9,13 +13,6 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
9
13
- Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage.
10
14
- Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences.
11
15
12
-
## LoRA
13
-
A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights. They are generally on the order of KB,MB, depending on the finetuning setup and model size.
14
-
15
-
With program-data separation, users can generate a PTE file containing the program and LoRA weights, and save the original foundation weights to a separate PTD file. Provided they are based on the same underlying model, multiple LoRA-adapted PTE files can share the same foundation weights. This means adding a model adapted to a new task incurs minimal binary size and runtime memory overhead; the cost of the lora adapter weights.
[Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==0.7.0 is released.
32
-
33
27
```
34
28
pip install executorch==0.7.0
35
29
```
36
30
37
31
## Export a model with program-data separation
38
32
To export a non-delegated linear model, into the current directory:
39
33
```python
40
-
python export.py --outdir .
34
+
python export_linear.py --outdir .
41
35
```
42
36
Expect the files 'linear.pte' and 'linear.ptd'.
43
37
44
38
To export a linear model delegated to XNNPACK, into the current directory:
45
39
```python
46
-
python export.py --outdir . --xnnpack
40
+
python export_linear.py --outdir . --xnnpack
47
41
```
48
42
Expect the files 'linear_xnnpack.pte' and 'linear_xnnpack.ptd'.
49
43
@@ -53,38 +47,16 @@ Note:
53
47
54
48
For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.
55
49
56
-
## Runtime (cpp)
57
-
The cpp/ directory contains the executorch submodule along with a main.cpp file that demonstrates how to load the PTE and PTD files and execute the program.
58
-
59
-
First, export your PTE and PTD files using the instructions above.
60
-
61
-
**Build instructions**
62
-
63
-
Change to the cpp directory.
64
-
```
65
-
cd cpp
66
-
```
67
-
68
-
Create build directory if it doesn't exist.
69
-
```
70
-
mkdir -p build
71
-
cd build
72
-
```
50
+
Please see [program-data-separation/cpp](cpp/) for instructions on running the exported models.
73
51
74
-
Configure CMake.
75
-
```
76
-
cmake -DCMAKE_BUILD_TYPE=Release ..
77
-
```
52
+
## Export a model with LoRA
53
+
A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.
78
54
79
-
Build the project.
80
-
```
81
-
cmake --build . -j$(nproc)
82
-
echo "Build complete! Executable located at: ./bin/executorch_program_data_separation"
83
-
```
55
+
To enable LoRA, we generate:
56
+
- PTE file/s: containing program and LoRA adapter weights.
Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.
LoRA is currently supported on executorch main. [Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==1.0 is released.
0 commit comments