Skip to content

Refactor program-data separation example #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 15 additions & 74 deletions program-data-separation/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Program Data Separation Examples

This directory provides an example of the Program Data Separation APIs in ExecuTorch.
This directory provides an example of the Program Data Separation APIs in ExecuTorch. Specifically, it showcases:
1. Program data separation examples using a linear model with the portable operators and XNNPACK.
2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights.

## Program Data Separation

The program-data separation APIs allow users to generate a separate data file when exporting and lowering a model. i.e., generate a PTE file containing the model execution program, and one (or more) [PTD](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) file/s containing only weights.

Expand All @@ -9,82 +13,19 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
- Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage.
- Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences.

## LoRA
A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights. They are generally on the order of KB,MB, depending on the finetuning setup and model size.

With program-data separation, users can generate a PTE file containing the program and LoRA weights, and save the original foundation weights to a separate PTD file. Provided they are based on the same underlying model, multiple LoRA-adapted PTE files can share the same foundation weights. This means adding a model adapted to a new task incurs minimal binary size and runtime memory overhead; the cost of the lora adapter weights.

An example of this usage is coming soon.

## Virtual environment setup
Create and activate a Python virtual environment:
```bash
python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip
```
Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
```bash
conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd
```

Install dependencies:

[Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==0.7.0 is released.

```
pip install executorch==0.7.0
```

## Export a model with program-data separation
To export a non-delegated linear model, into the current directory:
```python
python export.py --outdir .
```
Expect the files 'linear.pte' and 'linear.ptd'.

To export a linear model delegated to XNNPACK, into the current directory:
```python
python export.py --outdir . --xnnpack
```
Expect the files 'linear_xnnpack.pte' and 'linear_xnnpack.ptd'.

Note:
- PTE: contains the program execution logic.
- PTD: contains the constant tensors used by the PTE.

For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.

## Runtime (cpp)
The cpp/ directory contains the executorch submodule along with a main.cpp file that demonstrates how to load the PTE and PTD files and execute the program.

First, export your PTE and PTD files using the instructions above.

**Build instructions**

Change to the cpp directory.
```
cd cpp
```

Create build directory if it doesn't exist.
```
mkdir -p build
cd build
```
## Linear example
For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file.

Configure CMake.
```
cmake -DCMAKE_BUILD_TYPE=Release ..
```
## LoRA example
A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.

Build the project.
```
cmake --build . -j$(nproc)
echo "Build complete! Executable located at: ./bin/executorch_program_data_separation"
```
To enable LoRA, we generate:
- PTE file/s: containing program and LoRA adapter weights.
- PTD file: containing foundation weights.

Run the executable.
```
./bin/executorch_program_data_separation --model-path ../../linear.pte --data-path ../../linear.ptd
Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.

./bin/executorch_program_data_separation --model-path ../../linear_xnnpack.pte --data-path ../../linear_xnnpack.ptd
```
### Requirements
LoRA is currently supported on executorch main. [Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==1.0 is released.
2 changes: 1 addition & 1 deletion program-data-separation/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ option(EXECUTORCH_BUILD_XNNPACK "" ON)
# Add ExecuTorch subdirectory
add_subdirectory("executorch")

set(DEMO_SOURCES main.cpp)
set(DEMO_SOURCES linear_example/main.cpp)

# Create executable
add_executable(executorch_program_data_separation ${DEMO_SOURCES})
Expand Down
2 changes: 1 addition & 1 deletion program-data-separation/cpp/executorch
Submodule executorch updated 1095 files
74 changes: 74 additions & 0 deletions program-data-separation/cpp/linear_example/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# ExecuTorch Program Data Separation Demo C++.

This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md).


## Virtual environment setup.
Create and activate a Python virtual environment:
```bash
python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip
```
Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
```bash
conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd
```

Install dependencies:
```bash
pip install executorch==0.7.0
```

## Export the model/s.

Change into the program-data-separation directory and create a directory to hold exported artifacts.
```bash
cd ~/executorch-examples/program-data-separation
mkdir models
```

Export models into the `models` directory. The first command will generated undelegated model/data files, and the second will generate XNNPACK-delegated model/data files.
```bash
python export_linear.py --outdir models/
python export_linear.py --outdir models/ --xnnpack
```
Expect the files `linear.pte` and `linear.ptd`, `linear_xnnpack.pte` and `linear_xnnpack.ptd`.

Note:
- PTE: contains the program execution logic.
- PTD: contains the constant tensors used by the PTE.

See [program-data-separation](../../program-data-separation/README.md) for instructions.

## Install runtime dependencies.
The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`. To initialize it:
```bash
cd ~/executorch-examples/
git submodule sync
git submodule update --init --recursive
```
Install dev requirements for ExecuTorch

```bash
cd ~/executorch-examples/program-data-separation/cpp/executorch
pip install -r requirements-dev.txt
```

## Build the runtime.
Build the executable:
```bash
cd ~/executorch-examples/program-data-separation/cpp/linear_example
chmod +x build_example.sh
./build_example.sh
```

## Run the executable.
```
./build/bin/executorch_program_data_separation --model-path ../../models/linear.pte --data-path ../../models/linear.ptd

./build/bin/executorch_program_data_separation --model-path ../../models/linear_xnnpack.pte --data-path ../../models/linear_xnnpack.ptd
```

## Clean up.
rm -rf build
cd ~/executorch-examples/program-data-separation
rm -rf models
15 changes: 15 additions & 0 deletions program-data-separation/cpp/linear_example/build_example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash
set -e

# Clean and create build directory if it doesn't exist
rm -rf build
mkdir -p build
cd build

# Configure CMake
cmake -DCMAKE_BUILD_TYPE=Release ../..

# Build the project
cmake --build . -j$(nproc)

echo "Build complete! Executable located at: ./build/bin/executorch_program_data_separation"