Merge branch 'main' into dev1/winskuo/eurobert

winskuo-quic · web-flow · commit 3f09fe85cf19 · 2025-06-17T14:57:10.000+08:00
diff --git a/.github/workflows/android-perf.yml b/.github/workflows/android-perf.yml
@@ -342,8 +342,8 @@ jobs:
               git clone https://github.com/huggingface/optimum-executorch
               pushd optimum-executorch
               # There is no release yet, for CI stability, always test from the same commit on main
-              git checkout 1c653dc49812fc431a22312c7295d97005d22e12
-              python install_dev.py
+              git checkout 4c3b18f6cca68c5ccff809131d570062723d7188
+              python install_dev.py --skip_override_torch
               pip list
 
               ARGS=(
diff --git a/.github/workflows/apple-perf.yml b/.github/workflows/apple-perf.yml
@@ -347,8 +347,8 @@ jobs:
             git clone https://github.com/huggingface/optimum-executorch
             pushd optimum-executorch
             # There is no release yet, for CI stability, always test from the same commit on main
-            git checkout 1c653dc49812fc431a22312c7295d97005d22e12
-            ${CONDA_RUN} python install_dev.py
+            git checkout 4c3b18f6cca68c5ccff809131d570062723d7188
+            ${CONDA_RUN} python install_dev.py --skip_override_torch
             pip list
 
             ARGS=(
diff --git a/.github/workflows/trunk.yml b/.github/workflows/trunk.yml
@@ -597,9 +597,8 @@ jobs:
         git clone https://github.com/huggingface/optimum-executorch
         pushd optimum-executorch
         # There is no release yet, for CI stability, always test from the same commit on main
-        git checkout 1c653dc49812fc431a22312c7295d97005d22e12
-        pip install .[tests]
-        pip install transformers==4.52.4
+        git checkout 4c3b18f6cca68c5ccff809131d570062723d7188
+        python install_dev.py --skip_override_torch
         popd
         pip list
         echo "::endgroup::"
diff --git a/backends/mediatek/README.md b/backends/mediatek/README.md
@@ -14,23 +14,11 @@ The examples provided in this repository are tested and supported on the followi
 
 Before you begin, ensure you have the following prerequisites installed and configured:
 
-#### 1. Buck2 Build Tool
-
-- **Download Buck2**: Obtain Buck2 from the official [releases page](https://github.com/facebook/buck2/releases/tag/2024-02-01).
-- **Add to PATH**: Extract the downloaded file and add the directory to your system's `$PATH` environment variable.
-   ```bash
-   export PATH=<path_to_buck>:$PATH
-   ```
-
-#### 2. Android NDK
+#### 1. Android NDK
 
 - **Download Android NDK**: Acquire the Android NDK version 26.3.11579264 from the [Android developer site](https://developer.android.com/ndk/downloads).
-- **Set NDK Path**: Ensure that the `$ANDROID_NDK` environment variable is set to the path where the NDK is located.
-   ```bash
-   export ANDROID_NDK=<path_to_android_ndk>
-   ```
 
-#### 3. MediaTek ExecuTorch Libraries
+#### 2. MediaTek ExecuTorch Libraries
 
 To get started with MediaTek's ExecuTorch libraries, download the [NeuroPilot Express SDK](https://neuropilot.mediatek.com/resources/public/npexpress/en/docs/npexpress) from MediaTek's NeuroPilot portal. The SDK includes the following components:
 
@@ -60,26 +48,28 @@ Follow the steps below to setup your build environment:
    pip3 install mtk_neuron-8.2.19-py3-none-linux_x86_64.whl
    pip3 install mtk_converter-8.13.0+public-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    ```
-- Set evironment variables for building backend
-   ```bash
-   export NEURON_BUFFER_ALLOCATOR_LIB=<path_to_buffer_allocator>
-   ```
 
 ### Build
-1. Navigate to `scripts/` directory.
-
-2. **Build MediaTek Backend**: Once the prerequisites are in place, run the `mtk_build.sh` script to start the build process, MediaTek backend will be built under `cmake-android-out/backends/` as `libneuron_backend.so`
+1. Copy `NeuronAdapter.h` to `backends/mediatek/runtime/include/api/`
 
+2. Set NDK Path: Ensure that the `$ANDROID_NDK` environment variable is set to the path where the NDK is located.
    ```bash
-   ./mtk_build.sh
+   export ANDROID_NDK=<path_to_android_ndk>
    ```
 
-### Run
+3. Build the backend library `libneuron_backend.so`:
+    ```bash
+    cd backends/mediatek/scripts/
+    ./mtk_build.sh
+    ```
+The output is `libneuron_backend.so` in `cmake-android-out/backends/mediatek/`.
 
-1. **Push MediaTek universal SDK and MediaTek backend to the device**: push `libneuronusdk_adapter.mtk.so` and `libneuron_backend.so` to the phone and export it to the `$LD_LIBRARY_PATH` environment variable before executing ExecuTorch with MediaTek backend.
+### Run
 
+1. Push `libneuron_backend.so`, `libneuronusdk_adapter.mtk.so` and `libneuron_buffer_allocator.so` to the device.
+2. Set the library path before running ExecuTorch:
    ```bash
-   export LD_LIBRARY_PATH=<path_to_usdk>:<path_to_neuron_backend>:$LD_LIBRARY_PATH
+   export LD_LIBRARY_PATH=<path_to_neuron_backend>:<path_to_usdk>:<path_to_buffer_allocator>:$LD_LIBRARY_PATH
    ```
 
 Please refer to `executorch/examples/mediatek/` for export and execution examples of various of models.
diff --git a/devtools/inspector/_inspector_utils.py b/devtools/inspector/_inspector_utils.py
@@ -690,3 +690,34 @@ def map_runtime_aot_intermediate_outputs(
             )
 
     return aot_runtime_mapping
+
+
+def convert_to_float_tensor(input_data: Any) -> torch.Tensor:
+    """
+    Convert input_data into a torch.Tensor on CPU with dtype torch.float64.
+    This function handles the following types of input:
+    - Scalar (int or float): Converts to a tensor with a single element.
+    - Tensor: Converts to a float64 tensor on CPU.
+    - List of Tensors: Stacks the tensors into a single float64 tensor on CPU.
+    The resulting tensor is detached, moved to CPU, and cast to torch.float64.
+    Parameters:
+    input_data (Any): The input data to be converted to a tensor. It can be a scalar,
+                      a tensor, or a list of tensors.
+    Returns:
+    torch.Tensor: A tensor on CPU with dtype torch.float64.
+    Raises:
+    ValueError: If the input_data cannot be converted to a tensor.
+    """
+    try:
+        # Check if the input is a list of tensors
+        if isinstance(input_data, list):
+            input_tensor = torch.stack([convert_to_float_tensor(a) for a in input_data])
+        # Try to convert the input to a tensor
+        else:
+            input_tensor = torch.as_tensor(input_data, dtype=torch.float64)
+    except Exception as e:
+        raise ValueError(
+            f"Cannot convert value of type {type(input_data)} to a tensor: {e}"
+        )
+    input_tensor = input_tensor.detach().cpu().double()
+    return input_tensor
diff --git a/devtools/inspector/tests/inspector_utils_test.py b/devtools/inspector/tests/inspector_utils_test.py
@@ -29,6 +29,7 @@
     calculate_mse,
     calculate_snr,
     calculate_time_scale_factor,
+    convert_to_float_tensor,
     create_debug_handle_to_op_node_mapping,
     EDGE_DIALECT_GRAPH_KEY,
     find_populated_event,
@@ -317,6 +318,52 @@ def test_map_runtime_aot_intermediate_outputs_complex_chain(self):
         expected = {((1, 2, 3, 4, 5, 6), 300): ((2, 3, 4, 5, 6, 7), 350)}
         self.assertEqual(actual, expected)
 
+    def test_convert_input_to_tensor_convertible_inputs(self):
+        # Scalar -> tensor
+        actual_output1 = convert_to_float_tensor(5)
+        self.assertIsInstance(actual_output1, torch.Tensor)
+        self.assertEqual(actual_output1.dtype, torch.float64)
+        self.assertEqual(tuple(actual_output1.shape), ())
+        self.assertTrue(
+            torch.allclose(actual_output1, torch.tensor([5.0], dtype=torch.float64))
+        )
+        self.assertEqual(actual_output1.device.type, "cpu")
+
+        # Tensor of ints -> float32 CPU
+        t_int = torch.tensor([4, 5, 6], dtype=torch.int32)
+        actual_output2 = convert_to_float_tensor(t_int)
+        self.assertIsInstance(actual_output2, torch.Tensor)
+        self.assertEqual(actual_output2.dtype, torch.float64)
+        self.assertTrue(
+            torch.allclose(
+                actual_output2, torch.tensor([4.0, 5.0, 6.0], dtype=torch.float64)
+            )
+        )
+        self.assertEqual(actual_output2.device.type, "cpu")
+
+        # List of tensors -> stacked tensor float32 CPU
+        t_list = [torch.tensor([1, 2]), torch.tensor([2, 3]), torch.tensor([3, 4])]
+        actual_output3 = convert_to_float_tensor(t_list)
+        self.assertIsInstance(actual_output3, torch.Tensor)
+        self.assertEqual(actual_output3.dtype, torch.float64)
+        self.assertEqual(tuple(actual_output3.shape), (3, 2))
+        self.assertTrue(
+            torch.allclose(
+                actual_output3,
+                torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0]], dtype=torch.float64),
+            )
+        )
+        self.assertEqual(actual_output3.device.type, "cpu")
+
+    def test_convert_input_to_tensor_non_convertible_raises(self):
+        class X:
+            pass
+
+        with self.assertRaises(ValueError) as cm:
+            convert_to_float_tensor(X())
+        msg = str(cm.exception)
+        self.assertIn("Cannot convert value of type", msg)
+
 
 def gen_mock_operator_graph_with_expected_map() -> (
     Tuple[OperatorGraph, Dict[int, OperatorNode]]
diff --git a/docs/source/backends-mediatek.md b/docs/source/backends-mediatek.md
@@ -1,95 +1,79 @@
 # MediaTek Backend
 
-MediaTek backend empowers ExecuTorch to speed up PyTorch models on edge devices that equips with MediaTek Neuron Processing Unit (NPU). This document offers a step-by-step guide to set up the build environment for the MediaTek ExecuTorch libraries.
-
-::::{grid} 2
-:::{grid-item-card}  What you will learn in this tutorial:
-:class-card: card-prerequisites
-* How to export and lower a PyTorch model ahead of time with ExecuTorch for MediaTek devices.
-* How to build MediaTek backend and examples.
-* How to deploy the exported models on device with ExecuTorch runtime.
-:::
-:::{grid-item-card}  Tutorials we recommend you complete before this:
-:class-card: card-prerequisites
-* [Introduction to ExecuTorch](intro-how-it-works.md)
-* [Getting Started](getting-started.md)
-* [Building ExecuTorch with CMake](using-executorch-building-from-source.md)
-:::
-::::
-
-
-## Prerequisites (Hardware and Software)
-
-### Host OS
-- Linux operating system
-
-### Supported Chips:
-- MediaTek Dimensity 9300 (D9300)
-- MediaTek Dimensity 9400 (D9400)
+The MediaTek backend enables acceleration of PyTorch models on edge devices with MediaTek Neuron Processing Units (NPUs). This backend provides tools for exporting, building, and deploying models to leverage MediaTek hardware.
 
-### Software:
+## Features
 
-- [NeuroPilot Express SDK](https://neuropilot.mediatek.com/resources/public/npexpress/en/docs/npexpress) is a lightweight SDK for deploying AI applications on MediaTek SOC devices.
+- Acceleration of PyTorch models on MediaTek NPUs
+- Tools for model export and lowering
+- Example scripts for model deployment and execution
 
-## Setting up your developer environment
+## Target Requirements
 
-Follow the steps below to setup your build environment:
+- **Hardware:** MediaTek Dimensity 9300 (D9300), Dimensity 9400 (D9400)
+- **Host OS:** Linux
+- **SDK:** [NeuroPilot Express SDK](https://neuropilot.mediatek.com/resources/public/npexpress/en/docs/npexpress)
 
-1. **Setup ExecuTorch Environment**: Refer to the [Getting Started](getting-started.md) guide for detailed instructions on setting up the ExecuTorch environment.
+## Development Requirements
 
-2. **Setup MediaTek Backend Environment**
-   ```bash
-   pip3 install -r requirements.txt
-   ```
-- Install the two .whl downloaded from NeuroPilot Portal
-   ```bash
-   pip3 install mtk_neuron-8.2.19-py3-none-linux_x86_64.whl
-   pip3 install mtk_converter-8.13.0+public-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
-   ```
-- Set evironment variables for building backend
-   ```bash
-   export NEURON_BUFFER_ALLOCATOR_LIB=<path_to_buffer_allocator>
-   ```
-Additionally, make sure to copy `NeuronAdapter.h` to the following directory: `backends/mediatek/runtime/include/api/`.
+- Linux operating system
+- Python dependencies:
+  ```bash
+  pip3 install -r requirements.txt
+  ```
+- NeuroPilot SDK Python wheels (download from [NeuroPilot Express SDK](https://neuropilot.mediatek.com/resources/public/npexpress/en/docs/npexpress)):
+  ```bash
+  pip3 install mtk_neuron-8.2.19-py3-none-linux_x86_64.whl
+  pip3 install mtk_converter-8.13.0+public-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+  ```
 
-## Build
+## Using the MediaTek Backend
 
-### Ahead of time:
+### Exporting and Lowering a Model
 
-**Exporting a PyTorch Model for MediaTek Backend**:
-1. Lower and export the `.pte` file for on-device execution. The export script samples are povided under `example/mediatek/`. For example, the following commnad exports the `.pte` using the scripts provided.
+To export and lower a model for the MediaTek backend, use the provided shell script:
 ```bash
 cd executorch
-
 ./examples/mediatek/shell_scripts/export_oss.sh mobilenetv3
 ```
+The exported `.pte` file is saved in a directory named after the model.
 
-2. Find the `.pte` files under the directory named as same as the model.
+### Partitioner API
 
-### Runtime:
+A list of CompileSpec is suppported by MediaTek backend:
+- `platform-config`: Specifies the targeted MediaTek platform name to compile for.
 
-**Build MediaTek Backend for ExecuTorch Runtime**
-1. Navigate to `backends/mediatek/scripts/` directory.
+## Runtime Integration
 
-2. **Build MediaTek Backend**: Once the prerequisites are in place, run the `mtk_build.sh` script to start the build process:
-   ```bash
-   ./mtk_build.sh
-   ```
+This section presents an example of exporting and deploying a model. Please refer to `executorch/examples/mediatek/` for export and execution examples of various of models.
 
-3. MediaTek backend will be built under `cmake-android-out/backends/` as `libneuron_backend.so`.
+### Building Example Runners
 
-**Build a runner to execute the model on the device**:
-1. Build the runners and the backend by exedcuting the script:
+Build example runners:
 ```bash
 ./mtk_build_examples.sh
 ```
+Runners are located in `cmake-android-out/examples/mediatek/`.
 
-2. The runners will be built under `cmake-android-out/examples/`
+### Deploying to Device
 
-## Deploying and running on a device
+1. Push `libneuron_backend.so`, `libneuronusdk_adapter.mtk.so` and `libneuron_buffer_allocator.so` to the device.
+2. Set the library path before running ExecuTorch:
+   ```bash
+   export LD_LIBRARY_PATH=<path_to_neuron_backend>:<path_to_usdk>:<path_to_buffer_allocator>:$LD_LIBRARY_PATH
+   ```
 
-1. **Push MediaTek universal SDK and MediaTek backend to the device**: push `libneuronusdk_adapter.mtk.so` and `libneuron_backend.so` to the phone and export it to the `$LD_LIBRARY_PATH` environment variable before executing ExecuTorch with MediaTek backend.
+### Building the Backend from Source
+1. Copy `NeuronAdapter.h` to `backends/mediatek/runtime/include/api/`
 
+2. Set NDK Path: Ensure that the `$ANDROID_NDK` environment variable is set to the path where the NDK is located.
    ```bash
-   export LD_LIBRARY_PATH=<path_to_usdk>:<path_to_neuron_backend>:$LD_LIBRARY_PATH
+   export ANDROID_NDK=<path_to_android_ndk>
    ```
+
+3. Build the backend library `libneuron_backend.so`:
+    ```bash
+    cd backends/mediatek/scripts/
+    ./mtk_build.sh
+    ```
+The output is `libneuron_backend.so` in `cmake-android-out/backends/mediatek/`.
diff --git a/extension/llm/runner/irunner.h b/extension/llm/runner/irunner.h
@@ -121,6 +121,23 @@ class ET_EXPERIMENTAL IRunner {
       std::function<void(const std::string&)> token_callback,
       std::function<void(const Stats&)> stats_callback) = 0;
 
+  /**
+   * Generate text based on the provided prompt and generation config, from a
+   * given position in KV cache.
+   *
+   * @param prompt The input prompt to generate from
+   * @param start_pos The starting position in KV cache of the input
+   * @param config Generation configuration parameters
+   * @param token_callback Callback function called for each generated token
+   * @param stats_callback Callback function for generation statistics
+   * @return Error::Ok if successful, an error otherwise
+   */
+  virtual runtime::Error generate_from_pos(
+      const std::string& prompt,
+      int64_t start_pos,
+      const GenerationConfig& config,
+      std::function<void(const std::string&)> token_callback,
+      std::function<void(const Stats&)> stats_callback) = 0;
   /**
    * Stop the generation process.
    */
diff --git a/extension/llm/runner/test/test_text_llm_runner.cpp b/extension/llm/runner/test/test_text_llm_runner.cpp
diff --git a/extension/llm/runner/text_llm_runner.cpp b/extension/llm/runner/text_llm_runner.cpp
diff --git a/extension/llm/runner/text_llm_runner.h b/extension/llm/runner/text_llm_runner.h