From c8d583faf7cc4b75215d305759671e8b4685eb34 Mon Sep 17 00:00:00 2001
From: lucylq <lfq@meta.com>
Date: Mon, 11 Aug 2025 15:10:48 -0700
Subject: [PATCH 1/3] Refactor program-data separation example

---
 program-data-separation/README.md             | 60 +++++--------------
 program-data-separation/cpp/README.md         | 37 ++++++++++++
 program-data-separation/cpp/build_example.sh  | 14 +++++
 program-data-separation/cpp/executorch        |  2 +-
 .../{export.py => export_linear.py}           |  0
 5 files changed, 68 insertions(+), 45 deletions(-)
 create mode 100644 program-data-separation/cpp/README.md
 create mode 100644 program-data-separation/cpp/build_example.sh
 rename program-data-separation/{export.py => export_linear.py} (100%)

diff --git a/program-data-separation/README.md b/program-data-separation/README.md
index 473b41c7..d2b9af44 100644
--- a/program-data-separation/README.md
+++ b/program-data-separation/README.md
@@ -1,6 +1,10 @@
 # Program Data Separation Examples
 
-This directory provides an example of the Program Data Separation APIs in ExecuTorch.
+This directory provides an example of the Program Data Separation APIs in ExecuTorch. Specifically, it showcases:
+1. Simple program data separation examples using the portable operators and XNNPACK.
+2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights.
+
+## Program Data Separation
 
 The program-data separation APIs allow users to generate a separate data file when exporting and lowering a model. i.e., generate a PTE file containing the model execution program, and one (or more) [PTD](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) file/s containing only weights.
 
@@ -9,13 +13,6 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
 - Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage.
 - Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences.
 
-## LoRA
-A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights. They are generally on the order of KB,MB, depending on the finetuning setup and model size.
-
-With program-data separation, users can generate a PTE file containing the program and LoRA weights, and save the original foundation weights to a separate PTD file. Provided they are based on the same underlying model, multiple LoRA-adapted PTE files can share the same foundation weights. This means adding a model adapted to a new task incurs minimal binary size and runtime memory overhead; the cost of the lora adapter weights.
-
-An example of this usage is coming soon.
-
 ## Virtual environment setup
 Create and activate a Python virtual environment:
 ```bash
@@ -27,9 +24,6 @@ conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd
 ```
 
 Install dependencies:
-
-[Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==0.7.0 is released.
-
 ```
 pip install executorch==0.7.0
 ```
@@ -37,13 +31,13 @@ pip install executorch==0.7.0
 ## Export a model with program-data separation
 To export a non-delegated linear model, into the current directory:
 ```python
-python export.py --outdir .
+python export_linear.py --outdir .
 ```
 Expect the files 'linear.pte' and 'linear.ptd'.
 
 To export a linear model delegated to XNNPACK, into the current directory:
 ```python
-python export.py --outdir . --xnnpack
+python export_linear.py --outdir . --xnnpack
 ```
 Expect the files 'linear_xnnpack.pte' and 'linear_xnnpack.ptd'.
 
@@ -53,38 +47,16 @@ Note:
 
 For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.
 
-## Runtime (cpp)
-The cpp/ directory contains the executorch submodule along with a main.cpp file that demonstrates how to load the PTE and PTD files and execute the program.
-
-First, export your PTE and PTD files using the instructions above.
-
-**Build instructions**
-
-Change to the cpp directory.
-```
-cd cpp
-```
-
-Create build directory if it doesn't exist.
-```
-mkdir -p build
-cd build
-```
+Please see [program-data-separation/cpp](cpp/) for instructions on running the exported models.
 
-Configure CMake.
-```
-cmake -DCMAKE_BUILD_TYPE=Release ..
-```
+## Export a model with LoRA
+A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.
 
-Build the project.
-```
-cmake --build . -j$(nproc)
-echo "Build complete! Executable located at: ./bin/executorch_program_data_separation"
-```
+To enable LoRA, we generate:
+- PTE file/s: containing program and LoRA adapter weights.
+- PTD file: containing foundation weights.
 
-Run the executable.
-```
-./bin/executorch_program_data_separation --model-path ../../linear.pte --data-path ../../linear.ptd
+Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.
 
-./bin/executorch_program_data_separation --model-path ../../linear_xnnpack.pte --data-path ../../linear_xnnpack.ptd
-```
+### Requirements
+LoRA is currently supported on executorch main. [Please install ExecuTorch pip package from source](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source), until executorch==1.0 is released.
diff --git a/program-data-separation/cpp/README.md b/program-data-separation/cpp/README.md
new file mode 100644
index 00000000..f8c330fd
--- /dev/null
+++ b/program-data-separation/cpp/README.md
@@ -0,0 +1,37 @@
+# ExecuTorch Program Data Separation Demo C++.
+
+This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md).
+
+## Build instructions
+0. Export the model/s. See [program-data-separation](../program-data-separation/README.md) for instructions.
+1. The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`.  To initialize it:
+   ```bash
+    cd ~/executorch-examples/
+    git submodule sync
+    git submodule update --init --recursive
+   ```
+2. Install dev requirements for ExecuTorch
+
+    ```bash
+    cd ~/executorch-examples/mv2/cpp/executorch
+    pip install -r requirements-dev.txt
+    ```
+
+## Program-data separation demo
+**Build instructions**
+
+Build the executable:
+```bash
+cd ~/executorch-examples/program-data-separation/cpp
+chmod +x build_example.sh
+./build_example.sh
+```
+
+Run the executable.
+```
+./bin/executorch_program_data_separation --model-path ../../linear.pte --data-path ../../linear.ptd
+
+./bin/executorch_program_data_separation --model-path ../../linear_xnnpack.pte --data-path ../../linear_xnnpack.ptd
+```
+
+## LoRA demo
diff --git a/program-data-separation/cpp/build_example.sh b/program-data-separation/cpp/build_example.sh
new file mode 100644
index 00000000..5260dcb0
--- /dev/null
+++ b/program-data-separation/cpp/build_example.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+set -e
+
+# Create build directory if it doesn't exist
+mkdir -p build
+cd build
+
+# Configure CMake
+cmake -DCMAKE_BUILD_TYPE=Release ..
+
+# Build the project
+cmake --build . -j$(nproc)
+
+echo "Build complete! Executable located at: ./bin/executorch_program_data_separation"
diff --git a/program-data-separation/cpp/executorch b/program-data-separation/cpp/executorch
index 44564073..3a021469 160000
--- a/program-data-separation/cpp/executorch
+++ b/program-data-separation/cpp/executorch
@@ -1 +1 @@
-Subproject commit 445640739fbc761a10e61430724cafb8a410198b
+Subproject commit 3a021469b68708d71b87d2cea8f358a0b86f9977
diff --git a/program-data-separation/export.py b/program-data-separation/export_linear.py
similarity index 100%
rename from program-data-separation/export.py
rename to program-data-separation/export_linear.py

From 97b02c6b3a11930045b1a27399ba8499c24fdb27 Mon Sep 17 00:00:00 2001
From: lucylq <lfq@meta.com>
Date: Mon, 18 Aug 2025 14:36:57 -0700
Subject: [PATCH 2/3] refactor

---
 program-data-separation/README.md             | 35 +--------
 program-data-separation/cpp/CMakeLists.txt    |  2 +-
 program-data-separation/cpp/README.md         | 37 ----------
 program-data-separation/cpp/build_example.sh  | 14 ----
 .../cpp/linear_example/README.md              | 74 +++++++++++++++++++
 .../cpp/linear_example/build_example.sh       | 15 ++++
 .../cpp/{ => linear_example}/main.cpp         |  0
 7 files changed, 92 insertions(+), 85 deletions(-)
 delete mode 100644 program-data-separation/cpp/README.md
 delete mode 100644 program-data-separation/cpp/build_example.sh
 create mode 100644 program-data-separation/cpp/linear_example/README.md
 create mode 100755 program-data-separation/cpp/linear_example/build_example.sh
 rename program-data-separation/cpp/{ => linear_example}/main.cpp (100%)

diff --git a/program-data-separation/README.md b/program-data-separation/README.md
index d2b9af44..3d182149 100644
--- a/program-data-separation/README.md
+++ b/program-data-separation/README.md
@@ -13,41 +13,10 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
 - Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage.
 - Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences.
 
-## Virtual environment setup
-Create and activate a Python virtual environment:
-```bash
-python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip
-```
-Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
-```bash
-conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd
-```
-
-Install dependencies:
-```
-pip install executorch==0.7.0
-```
-
-## Export a model with program-data separation
-To export a non-delegated linear model, into the current directory:
-```python
-python export_linear.py --outdir .
-```
-Expect the files 'linear.pte' and 'linear.ptd'.
-
-To export a linear model delegated to XNNPACK, into the current directory:
-```python
-python export_linear.py --outdir . --xnnpack
-```
-Expect the files 'linear_xnnpack.pte' and 'linear_xnnpack.ptd'.
-
-Note:
-- PTE: contains the program execution logic.
-- PTD: contains the constant tensors used by the PTE.
-
 For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.
 
-Please see [program-data-separation/cpp](cpp/) for instructions on running the exported models.
+## Export a model with program-data separation
+For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file.
 
 ## Export a model with LoRA
 A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.
diff --git a/program-data-separation/cpp/CMakeLists.txt b/program-data-separation/cpp/CMakeLists.txt
index f6fb3d34..75045c1f 100644
--- a/program-data-separation/cpp/CMakeLists.txt
+++ b/program-data-separation/cpp/CMakeLists.txt
@@ -17,7 +17,7 @@ option(EXECUTORCH_BUILD_XNNPACK "" ON)
 # Add ExecuTorch subdirectory
 add_subdirectory("executorch")
 
-set(DEMO_SOURCES main.cpp)
+set(DEMO_SOURCES linear_example/main.cpp)
 
 # Create executable
 add_executable(executorch_program_data_separation ${DEMO_SOURCES})
diff --git a/program-data-separation/cpp/README.md b/program-data-separation/cpp/README.md
deleted file mode 100644
index f8c330fd..00000000
--- a/program-data-separation/cpp/README.md
+++ /dev/null
@@ -1,37 +0,0 @@
-# ExecuTorch Program Data Separation Demo C++.
-
-This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md).
-
-## Build instructions
-0. Export the model/s. See [program-data-separation](../program-data-separation/README.md) for instructions.
-1. The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`.  To initialize it:
-   ```bash
-    cd ~/executorch-examples/
-    git submodule sync
-    git submodule update --init --recursive
-   ```
-2. Install dev requirements for ExecuTorch
-
-    ```bash
-    cd ~/executorch-examples/mv2/cpp/executorch
-    pip install -r requirements-dev.txt
-    ```
-
-## Program-data separation demo
-**Build instructions**
-
-Build the executable:
-```bash
-cd ~/executorch-examples/program-data-separation/cpp
-chmod +x build_example.sh
-./build_example.sh
-```
-
-Run the executable.
-```
-./bin/executorch_program_data_separation --model-path ../../linear.pte --data-path ../../linear.ptd
-
-./bin/executorch_program_data_separation --model-path ../../linear_xnnpack.pte --data-path ../../linear_xnnpack.ptd
-```
-
-## LoRA demo
diff --git a/program-data-separation/cpp/build_example.sh b/program-data-separation/cpp/build_example.sh
deleted file mode 100644
index 5260dcb0..00000000
--- a/program-data-separation/cpp/build_example.sh
+++ /dev/null
@@ -1,14 +0,0 @@
-#!/bin/bash
-set -e
-
-# Create build directory if it doesn't exist
-mkdir -p build
-cd build
-
-# Configure CMake
-cmake -DCMAKE_BUILD_TYPE=Release ..
-
-# Build the project
-cmake --build . -j$(nproc)
-
-echo "Build complete! Executable located at: ./bin/executorch_program_data_separation"
diff --git a/program-data-separation/cpp/linear_example/README.md b/program-data-separation/cpp/linear_example/README.md
new file mode 100644
index 00000000..d903a3de
--- /dev/null
+++ b/program-data-separation/cpp/linear_example/README.md
@@ -0,0 +1,74 @@
+# ExecuTorch Program Data Separation Demo C++.
+
+This directory contains the C++ code to run the examples generated in [program-data-separation](../program-data-separation/README.md).
+
+
+## Virtual environment setup.
+Create and activate a Python virtual environment:
+```bash
+python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip
+```
+Or alternatively, [install conda on your machine](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
+```bash
+conda create -yn executorch-ptd python=3.10.0 && conda activate executorch-ptd
+```
+
+Install dependencies:
+```bash
+pip install executorch==0.7.0
+```
+
+## Export the model/s.
+
+Change into the program-data-separation directory and create a directory to hold exported artifacts.
+```bash
+cd ~/executorch-examples/program-data-separation
+mkdir models
+```
+
+Export models into the `models` directory. The first command will generated undelegated model/data files, and the second will generate XNNPACK-delegated model/data files.
+```bash
+python export_linear.py --outdir models/
+python export_linear.py --outdir models/ --xnnpack
+```
+Expect the files `linear.pte` and `linear.ptd`, `linear_xnnpack.pte` and `linear_xnnpack.ptd`.
+
+Note:
+- PTE: contains the program execution logic.
+- PTD: contains the constant tensors used by the PTE.
+
+See [program-data-separation](../../program-data-separation/README.md) for instructions.
+
+## Install runtime dependencies.
+The ExecuTorch repository is configured as a git submodule at `~/executorch-examples/program-data-separation/cpp/executorch`.  To initialize it:
+```bash
+cd ~/executorch-examples/
+git submodule sync
+git submodule update --init --recursive
+```
+Install dev requirements for ExecuTorch
+
+```bash
+cd ~/executorch-examples/program-data-separation/cpp/executorch
+pip install -r requirements-dev.txt
+```
+
+## Build the runtime.
+Build the executable:
+```bash
+cd ~/executorch-examples/program-data-separation/cpp/linear_example
+chmod +x build_example.sh
+./build_example.sh
+```
+
+## Run the executable.
+```
+./build/bin/executorch_program_data_separation --model-path ../../models/linear.pte --data-path ../../models/linear.ptd
+
+./build/bin/executorch_program_data_separation --model-path ../../models/linear_xnnpack.pte --data-path ../../models/linear_xnnpack.ptd
+```
+
+## Clean up.
+rm -rf build
+cd ~/executorch-examples/program-data-separation
+rm -rf models
diff --git a/program-data-separation/cpp/linear_example/build_example.sh b/program-data-separation/cpp/linear_example/build_example.sh
new file mode 100755
index 00000000..f94258ae
--- /dev/null
+++ b/program-data-separation/cpp/linear_example/build_example.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+set -e
+
+# Clean and create build directory if it doesn't exist
+rm -rf build
+mkdir -p build
+cd build
+
+# Configure CMake
+cmake -DCMAKE_BUILD_TYPE=Release ../..
+
+# Build the project
+cmake --build . -j$(nproc)
+
+echo "Build complete! Executable located at: ./build/bin/executorch_program_data_separation"
diff --git a/program-data-separation/cpp/main.cpp b/program-data-separation/cpp/linear_example/main.cpp
similarity index 100%
rename from program-data-separation/cpp/main.cpp
rename to program-data-separation/cpp/linear_example/main.cpp

From a726c0b4086bce7937d6b26ddf4d0647d3cdfefb Mon Sep 17 00:00:00 2001
From: lucylq <lfq@meta.com>
Date: Mon, 18 Aug 2025 14:39:58 -0700
Subject: [PATCH 3/3] refactor

---
 program-data-separation/README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/program-data-separation/README.md b/program-data-separation/README.md
index 3d182149..2e6aa5cb 100644
--- a/program-data-separation/README.md
+++ b/program-data-separation/README.md
@@ -1,7 +1,7 @@
 # Program Data Separation Examples
 
 This directory provides an example of the Program Data Separation APIs in ExecuTorch. Specifically, it showcases:
-1. Simple program data separation examples using the portable operators and XNNPACK.
+1. Program data separation examples using a linear model with the portable operators and XNNPACK.
 2. LoRA inference example with a LoRA and non-LoRA model sharing foundation weights.
 
 ## Program Data Separation
@@ -15,10 +15,10 @@ PTD files are used to store data outside of the PTE file. Some use-cases:
 
 For more information on the PTD data format, please see the [flat_tensor](https://github.com/pytorch/executorch/blob/main/extension/flat_tensor/README.md) directory.
 
-## Export a model with program-data separation
+## Linear example
 For a demo of the program-data separation APIs using a linear model, please see [program-data-separation/cpp/linear_example](linear_example/). This example generates and runs a program-data separated linear model, with weights and bias in a separate .ptd file.
 
-## Export a model with LoRA
+## LoRA example
 A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.
 
 To enable LoRA, we generate: