From c8d583faf7cc4b75215d305759671e8b4685eb34 Mon Sep 17 00:00:00 2001 From: lucylq Date: Mon, 11 Aug 2025 15:10:48 -0700 Subject: [PATCH 1/3] Refactor program-data separation example --- program-data-separation/README.md | 60 +++++-------------- program-data-separation/cpp/README.md | 37 ++++++++++++ program-data-separation/cpp/build_example.sh | 14 +++++ program-data-separation/cpp/executorch | 2 +- .../{export.py => export_linear.py} | 0 5 files changed, 68 insertions(+), 45 deletions(-) create mode 100644 program-data-separation/cpp/README.md create mode 100644 program-data-separation/cpp/build_example.sh rename program-data-separation/{export.py => export_linear.py} (100%) diff --git a/program-data-separation/README.md b/program-data-separation/README.md index 473b41c7..d2b9af44 100644 --- a/program-data-separation/README.md +++ b/program-data-separation/README.md @@ -1,6 +1,10 @@ # Program Data Separation Examples -This directory provides an example of the Program Data Separation APIs in ExecuTorch. +This directory provides an example of the Program Data Separation APIs in ExecuTorch. Specifically, it showcases: +1. Simple program data separation examples using the portable operators and XNNPACK. +2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights. + +## Program Data Separation The program-data separation APIs allow users to generate a separate data file when exporting and lowering a model. i.e., generate a PTE file containing the model execution program, and one (or more) [PTD](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) file/s containing only weights. @@ -9,13 +13,6 @@ PTD files are used to store data outside of the PTE file. Some use-cases: - Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage. - Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences. -## LoRA -A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights. They are generally on the order of KB,MB, depending on the finetuning setup and model size. - -With program-data separation, users can generate a PTE file containing the program and LoRA weights, and save the original foundation weights to a separate PTD file. Provided they are based on the same underlying model, multiple LoRA-adapted PTE files can share the same foundation weights. This means adding a model adapted to a new task incurs minimal binary size and runtime memory overhead; the cost of the lora adapter weights. - -An example of this usage is coming soon. - ## Virtual environment setup Create and activate a Python virtual environment: ```bash @@ -27,9 +24,6 @@ conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd ``` Install dependencies: - -[Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==0.7.0 is released. - ``` pip install executorch==0.7.0 ``` @@ -37,13 +31,13 @@ pip install executorch==0.7.0 ## Export a model with program-data separation To export a non-delegated linear model, into the current directory: ```python -python export.py --outdir . +python export_linear.py --outdir . ``` Expect the files 'linear.pte' and 'linear.ptd'. To export a linear model delegated to XNNPACK, into the current directory: ```python -python export.py --outdir . --xnnpack +python export_linear.py --outdir . --xnnpack ``` Expect the files 'linear_xnnpack.pte' and 'linear_xnnpack.ptd'. @@ -53,38 +47,16 @@ Note: For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory. -## Runtime (cpp) -The cpp/ directory contains the executorch submodule along with a main.cpp file that demonstrates how to load the PTE and PTD files and execute the program. - -First, export your PTE and PTD files using the instructions above. - -**Build instructions** - -Change to the cpp directory. -``` -cd cpp -``` - -Create build directory if it doesn't exist. -``` -mkdir -p build -cd build -``` +Please see [program-data-separation/cpp](cpp/) for instructions on running the exported models. -Configure CMake. -``` -cmake -DCMAKE_BUILD_TYPE=Release .. -``` +## Export a model with LoRA +A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size. -Build the project. -``` -cmake --build . -j$(nproc) -echo "Build complete! Executable located at: ./bin/executorch_program_data_separation" -``` +To enable LoRA, we generate: +- PTE file/s: containing program and LoRA adapter weights. +- PTD file: containing foundation weights. -Run the executable. -``` -./bin/executorch_program_data_separation --model-path ../../linear.pte --data-path ../../linear.ptd +Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead. -./bin/executorch_program_data_separation --model-path ../../linear_xnnpack.pte --data-path ../../linear_xnnpack.ptd -``` +### Requirements +LoRA is currently supported on executorch main. [Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==1.0 is released. diff --git a/program-data-separation/cpp/README.md b/program-data-separation/cpp/README.md new file mode 100644 index 00000000..f8c330fd --- /dev/null +++ b/program-data-separation/cpp/README.md @@ -0,0 +1,37 @@ +# ExecuTorch Program Data Separation Demo C++. + +This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md). + +## Build instructions +0. Export the model/s. See [program-data-separation](../program-data-separation/README.md) for instructions. +1. The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`. To initialize it: + ```bash + cd ~/executorch-examples/ + git submodule sync + git submodule update --init --recursive + ``` +2. Install dev requirements for ExecuTorch + + ```bash + cd ~/executorch-examples/mv2/cpp/executorch + pip install -r requirements-dev.txt + ``` + +## Program-data separation demo +**Build instructions** + +Build the executable: +```bash +cd ~/executorch-examples/program-data-separation/cpp +chmod +x build_example.sh +./build_example.sh +``` + +Run the executable. +``` +./bin/executorch_program_data_separation --model-path ../../linear.pte --data-path ../../linear.ptd + +./bin/executorch_program_data_separation --model-path ../../linear_xnnpack.pte --data-path ../../linear_xnnpack.ptd +``` + +## LoRA demo diff --git a/program-data-separation/cpp/build_example.sh b/program-data-separation/cpp/build_example.sh new file mode 100644 index 00000000..5260dcb0 --- /dev/null +++ b/program-data-separation/cpp/build_example.sh @@ -0,0 +1,14 @@ +#!/bin/bash +set -e + +# Create build directory if it doesn't exist +mkdir -p build +cd build + +# Configure CMake +cmake -DCMAKE_BUILD_TYPE=Release .. + +# Build the project +cmake --build . -j$(nproc) + +echo "Build complete! Executable located at: ./bin/executorch_program_data_separation" diff --git a/program-data-separation/cpp/executorch b/program-data-separation/cpp/executorch index 44564073..3a021469 160000 --- a/program-data-separation/cpp/executorch +++ b/program-data-separation/cpp/executorch @@ -1 +1 @@ -Subproject commit 445640739fbc761a10e61430724cafb8a410198b +Subproject commit 3a021469b68708d71b87d2cea8f358a0b86f9977 diff --git a/program-data-separation/export.py b/program-data-separation/export_linear.py similarity index 100% rename from program-data-separation/export.py rename to program-data-separation/export_linear.py From 97b02c6b3a11930045b1a27399ba8499c24fdb27 Mon Sep 17 00:00:00 2001 From: lucylq Date: Mon, 18 Aug 2025 14:36:57 -0700 Subject: [PATCH 2/3] refactor --- program-data-separation/README.md | 35 +-------- program-data-separation/cpp/CMakeLists.txt | 2 +- program-data-separation/cpp/README.md | 37 ---------- program-data-separation/cpp/build_example.sh | 14 ---- .../cpp/linear_example/README.md | 74 +++++++++++++++++++ .../cpp/linear_example/build_example.sh | 15 ++++ .../cpp/{ => linear_example}/main.cpp | 0 7 files changed, 92 insertions(+), 85 deletions(-) delete mode 100644 program-data-separation/cpp/README.md delete mode 100644 program-data-separation/cpp/build_example.sh create mode 100644 program-data-separation/cpp/linear_example/README.md create mode 100755 program-data-separation/cpp/linear_example/build_example.sh rename program-data-separation/cpp/{ => linear_example}/main.cpp (100%) diff --git a/program-data-separation/README.md b/program-data-separation/README.md index d2b9af44..3d182149 100644 --- a/program-data-separation/README.md +++ b/program-data-separation/README.md @@ -13,41 +13,10 @@ PTD files are used to store data outside of the PTE file. Some use-cases: - Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage. - Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences. -## Virtual environment setup -Create and activate a Python virtual environment: -```bash -python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip -``` -Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) -```bash -conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd -``` - -Install dependencies: -``` -pip install executorch==0.7.0 -``` - -## Export a model with program-data separation -To export a non-delegated linear model, into the current directory: -```python -python export_linear.py --outdir . -``` -Expect the files 'linear.pte' and 'linear.ptd'. - -To export a linear model delegated to XNNPACK, into the current directory: -```python -python export_linear.py --outdir . --xnnpack -``` -Expect the files 'linear_xnnpack.pte' and 'linear_xnnpack.ptd'. - -Note: -- PTE: contains the program execution logic. -- PTD: contains the constant tensors used by the PTE. - For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory. -Please see [program-data-separation/cpp](cpp/) for instructions on running the exported models. +## Export a model with program-data separation +For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file. ## Export a model with LoRA A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size. diff --git a/program-data-separation/cpp/CMakeLists.txt b/program-data-separation/cpp/CMakeLists.txt index f6fb3d34..75045c1f 100644 --- a/program-data-separation/cpp/CMakeLists.txt +++ b/program-data-separation/cpp/CMakeLists.txt @@ -17,7 +17,7 @@ option(EXECUTORCH_BUILD_XNNPACK "" ON) # Add ExecuTorch subdirectory add_subdirectory("executorch") -set(DEMO_SOURCES main.cpp) +set(DEMO_SOURCES linear_example/main.cpp) # Create executable add_executable(executorch_program_data_separation ${DEMO_SOURCES}) diff --git a/program-data-separation/cpp/README.md b/program-data-separation/cpp/README.md deleted file mode 100644 index f8c330fd..00000000 --- a/program-data-separation/cpp/README.md +++ /dev/null @@ -1,37 +0,0 @@ -# ExecuTorch Program Data Separation Demo C++. - -This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md). - -## Build instructions -0. Export the model/s. See [program-data-separation](../program-data-separation/README.md) for instructions. -1. The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`. To initialize it: - ```bash - cd ~/executorch-examples/ - git submodule sync - git submodule update --init --recursive - ``` -2. Install dev requirements for ExecuTorch - - ```bash - cd ~/executorch-examples/mv2/cpp/executorch - pip install -r requirements-dev.txt - ``` - -## Program-data separation demo -**Build instructions** - -Build the executable: -```bash -cd ~/executorch-examples/program-data-separation/cpp -chmod +x build_example.sh -./build_example.sh -``` - -Run the executable. -``` -./bin/executorch_program_data_separation --model-path ../../linear.pte --data-path ../../linear.ptd - -./bin/executorch_program_data_separation --model-path ../../linear_xnnpack.pte --data-path ../../linear_xnnpack.ptd -``` - -## LoRA demo diff --git a/program-data-separation/cpp/build_example.sh b/program-data-separation/cpp/build_example.sh deleted file mode 100644 index 5260dcb0..00000000 --- a/program-data-separation/cpp/build_example.sh +++ /dev/null @@ -1,14 +0,0 @@ -#!/bin/bash -set -e - -# Create build directory if it doesn't exist -mkdir -p build -cd build - -# Configure CMake -cmake -DCMAKE_BUILD_TYPE=Release .. - -# Build the project -cmake --build . -j$(nproc) - -echo "Build complete! Executable located at: ./bin/executorch_program_data_separation" diff --git a/program-data-separation/cpp/linear_example/README.md b/program-data-separation/cpp/linear_example/README.md new file mode 100644 index 00000000..d903a3de --- /dev/null +++ b/program-data-separation/cpp/linear_example/README.md @@ -0,0 +1,74 @@ +# ExecuTorch Program Data Separation Demo C++. + +This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md). + + +## Virtual environment setup. +Create and activate a Python virtual environment: +```bash +python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip +``` +Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) +```bash +conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd +``` + +Install dependencies: +```bash +pip install executorch==0.7.0 +``` + +## Export the model/s. + +Change into the program-data-separation directory and create a directory to hold exported artifacts. +```bash +cd ~/executorch-examples/program-data-separation +mkdir models +``` + +Export models into the `models` directory. The first command will generated undelegated model/data files, and the second will generate XNNPACK-delegated model/data files. +```bash +python export_linear.py --outdir models/ +python export_linear.py --outdir models/ --xnnpack +``` +Expect the files `linear.pte` and `linear.ptd`, `linear_xnnpack.pte` and `linear_xnnpack.ptd`. + +Note: +- PTE: contains the program execution logic. +- PTD: contains the constant tensors used by the PTE. + +See [program-data-separation](../../program-data-separation/README.md) for instructions. + +## Install runtime dependencies. +The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`. To initialize it: +```bash +cd ~/executorch-examples/ +git submodule sync +git submodule update --init --recursive +``` +Install dev requirements for ExecuTorch + +```bash +cd ~/executorch-examples/program-data-separation/cpp/executorch +pip install -r requirements-dev.txt +``` + +## Build the runtime. +Build the executable: +```bash +cd ~/executorch-examples/program-data-separation/cpp/linear_example +chmod +x build_example.sh +./build_example.sh +``` + +## Run the executable. +``` +./build/bin/executorch_program_data_separation --model-path ../../models/linear.pte --data-path ../../models/linear.ptd + +./build/bin/executorch_program_data_separation --model-path ../../models/linear_xnnpack.pte --data-path ../../models/linear_xnnpack.ptd +``` + +## Clean up. +rm -rf build +cd ~/executorch-examples/program-data-separation +rm -rf models diff --git a/program-data-separation/cpp/linear_example/build_example.sh b/program-data-separation/cpp/linear_example/build_example.sh new file mode 100755 index 00000000..f94258ae --- /dev/null +++ b/program-data-separation/cpp/linear_example/build_example.sh @@ -0,0 +1,15 @@ +#!/bin/bash +set -e + +# Clean and create build directory if it doesn't exist +rm -rf build +mkdir -p build +cd build + +# Configure CMake +cmake -DCMAKE_BUILD_TYPE=Release ../.. + +# Build the project +cmake --build . -j$(nproc) + +echo "Build complete! Executable located at: ./build/bin/executorch_program_data_separation" diff --git a/program-data-separation/cpp/main.cpp b/program-data-separation/cpp/linear_example/main.cpp similarity index 100% rename from program-data-separation/cpp/main.cpp rename to program-data-separation/cpp/linear_example/main.cpp From a726c0b4086bce7937d6b26ddf4d0647d3cdfefb Mon Sep 17 00:00:00 2001 From: lucylq Date: Mon, 18 Aug 2025 14:39:58 -0700 Subject: [PATCH 3/3] refactor --- program-data-separation/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/program-data-separation/README.md b/program-data-separation/README.md index 3d182149..2e6aa5cb 100644 --- a/program-data-separation/README.md +++ b/program-data-separation/README.md @@ -1,7 +1,7 @@ # Program Data Separation Examples This directory provides an example of the Program Data Separation APIs in ExecuTorch. Specifically, it showcases: -1. Simple program data separation examples using the portable operators and XNNPACK. +1. Program data separation examples using a linear model with the portable operators and XNNPACK. 2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights. ## Program Data Separation @@ -15,10 +15,10 @@ PTD files are used to store data outside of the PTE file. Some use-cases: For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory. -## Export a model with program-data separation +## Linear example For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file. -## Export a model with LoRA +## LoRA example A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size. To enable LoRA, we generate: