You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This directory provides an example of the Program Data Separation APIs in ExecuTorch.
3
+
This directory provides an example of the Program Data Separation APIs in ExecuTorch. Specifically, it showcases:
4
+
1. Program data separation examples using a linear model with the portable operators and XNNPACK.
5
+
2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights.
6
+
7
+
## Program Data Separation
4
8
5
9
The program-data separation APIs allow users to generate a separate data file when exporting and lowering a model. i.e., generate a PTE file containing the model execution program, and one (or more) [PTD](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) file/s containing only weights.
6
10
@@ -9,82 +13,19 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
9
13
- Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage.
10
14
- Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences.
11
15
12
-
## LoRA
13
-
A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights. They are generally on the order of KB,MB, depending on the finetuning setup and model size.
14
-
15
-
With program-data separation, users can generate a PTE file containing the program and LoRA weights, and save the original foundation weights to a separate PTD file. Provided they are based on the same underlying model, multiple LoRA-adapted PTE files can share the same foundation weights. This means adding a model adapted to a new task incurs minimal binary size and runtime memory overhead; the cost of the lora adapter weights.
[Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==0.7.0 is released.
32
-
33
-
```
34
-
pip install executorch==0.7.0
35
-
```
36
-
37
-
## Export a model with program-data separation
38
-
To export a non-delegated linear model, into the current directory:
39
-
```python
40
-
python export.py --outdir .
41
-
```
42
-
Expect the files 'linear.pte' and 'linear.ptd'.
43
-
44
-
To export a linear model delegated to XNNPACK, into the current directory:
45
-
```python
46
-
python export.py --outdir . --xnnpack
47
-
```
48
-
Expect the files 'linear_xnnpack.pte' and 'linear_xnnpack.ptd'.
49
-
50
-
Note:
51
-
- PTE: contains the program execution logic.
52
-
- PTD: contains the constant tensors used by the PTE.
53
-
54
16
For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.
55
17
56
-
## Runtime (cpp)
57
-
The cpp/ directory contains the executorch submodule along with a main.cpp file that demonstrates how to load the PTE and PTD files and execute the program.
58
-
59
-
First, export your PTE and PTD files using the instructions above.
60
-
61
-
**Build instructions**
62
-
63
-
Change to the cpp directory.
64
-
```
65
-
cd cpp
66
-
```
67
-
68
-
Create build directory if it doesn't exist.
69
-
```
70
-
mkdir -p build
71
-
cd build
72
-
```
18
+
## Linear example
19
+
For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file.
73
20
74
-
Configure CMake.
75
-
```
76
-
cmake -DCMAKE_BUILD_TYPE=Release ..
77
-
```
21
+
## LoRA example
22
+
A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.
78
23
79
-
Build the project.
80
-
```
81
-
cmake --build . -j$(nproc)
82
-
echo "Build complete! Executable located at: ./bin/executorch_program_data_separation"
83
-
```
24
+
To enable LoRA, we generate:
25
+
- PTE file/s: containing program and LoRA adapter weights.
Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.
LoRA is currently supported on executorch main. [Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==1.0 is released.
Change into the program-data-separation directory and create a directory to hold exported artifacts.
24
+
```bash
25
+
cd~/executorch-examples/program-data-separation
26
+
mkdir models
27
+
```
28
+
29
+
Export models into the `models` directory. The first command will generated undelegated model/data files, and the second will generate XNNPACK-delegated model/data files.
0 commit comments