Merge pull request #31 from cavusmustafa/additional_updates

cavusmustafa · web-flow · commit 4eee40d1bb92 · 2025-03-10T13:52:08.000-07:00
Additional Updates on Export and Infer Scripts
diff --git a/backends/openvino/README.md b/backends/openvino/README.md
@@ -32,8 +32,7 @@ executorch
 │       └── requirements.txt
 └── examples
     └── openvino
-        ├── aot_openvino_compiler.py
-        ├── export_and_infer_openvino.py
+        ├── aot_optimize_and_infer.py
         └── README.md
 ```
 
diff --git a/examples/openvino/README.md b/examples/openvino/README.md
@@ -9,8 +9,7 @@ Below is the layout of the `examples/openvino` directory, which includes the nec
 ```
 examples/openvino
 ├── README.md                           # Documentation for examples (this file)
-├── aot_openvino_compiler.py            # Example script for AoT export
-└── export_and_infer_openvino.py        # Example script to export and execute models with python bindings
+└── aot_optimize_and_infer.py           # Example script to export and execute models
 ```
 
 # Build Instructions for Examples
@@ -20,14 +19,10 @@ Follow the [instructions](../../backends/openvino/README.md) of **Prerequisites*
 
 ## AOT step:
 
-The export script called `aot_openvino_compiler.py` allows users to export deep learning models from various model suites (TIMM, Torchvision, Hugging Face) to a openvino backend using **Executorch**. Users can dynamically specify the model, input shape, and target device.
+The python script called `aot_optimize_and_infer.py` allows users to export deep learning models from various model suites (TIMM, Torchvision, Hugging Face) to a openvino backend using **Executorch**. Users can dynamically specify the model, input shape, and target device.
 
 ### **Usage**
 
-#### **Command Structure**
-```bash
-python aot_openvino_compiler.py --suite <MODEL_SUITE> --model <MODEL_NAME> --input_shape <INPUT_SHAPE> --device <DEVICE>
-```
 
 #### **Arguments**
 - **`--suite`** (required):
@@ -50,6 +45,12 @@ python aot_openvino_compiler.py --suite <MODEL_SUITE> --model <MODEL_NAME> --inp
   - `[1, 3, 224, 224]` (Zsh users: wrap in quotes)
   - `(1, 3, 224, 224)`
 
+- **`--export`** (optional):
+  Save the exported model as a `.pte` file.
+
+- **`--model_file_name`** (optional):
+  Specify a custom file name to save the exported model.
+
 - **`--batch_size`** :
   Batch size for the validation. Default batch_size == 1.
   The dataset length must be evenly divisible by the batch size.
@@ -63,35 +64,55 @@ python aot_openvino_compiler.py --suite <MODEL_SUITE> --model <MODEL_NAME> --inp
 - **`--dataset`** (optional):
   Path to the imagenet-like calibration dataset.
 
+- **`--infer`** (optional):
+  Execute inference with the compiled model and report average inference timing.
+
+- **`--num_iter`** (optional):
+  Number of iterations to execute inference. Default value for the number of iterations is `1`.
+
+- **`--warmup_iter`** (optional):
+  Number of warmup iterations to execute inference before timing begins. Default value for the warmup iterations is `0`.
+
+- **`--input_tensor_path`** (optional):
+  Path to the raw tensor file to be used as input for inference. If this argument is not provided, a random input tensor will be generated.
+
+- **`--output_tensor_path`** (optional):
+  Path to the raw tensor file which the output of the inference to be saved.
+
 - **`--device`** (optional)
   Target device for the compiled model. Default is `CPU`.
   Examples: `CPU`, `GPU`
 
 
-### **Examples**
+#### **Examples**
 
-#### Export a TIMM VGG16 model for the CPU
+##### Export a TIMM VGG16 model for the CPU
 ```bash
-python aot_openvino_compiler.py --suite timm --model vgg16 --input_shape [1, 3, 224, 224] --device CPU
+python aot_optimize_and_infer.py --export --suite timm --model vgg16 --input_shape [1, 3, 224, 224] --device CPU
 ```
 
-#### Export a Torchvision ResNet50 model for the GPU
+##### Export a Torchvision ResNet50 model for the GPU
 ```bash
-python aot_openvino_compiler.py --suite torchvision --model resnet50 --input_shape "(1, 3, 256, 256)" --device GPU
+python aot_optimize_and_infer.py --export --suite torchvision --model resnet50 --input_shape "(1, 3, 256, 256)" --device GPU
 ```
 
-#### Export a Hugging Face BERT model for the CPU
+##### Export a Hugging Face BERT model for the CPU
+```bash
+python aot_optimize_and_infer.py --export --suite huggingface --model bert-base-uncased --input_shape "(1, 512)" --device CPU
+```
+##### Export and validate TIMM Resnet50d model for the CPU
 ```bash
-python aot_openvino_compiler.py --suite huggingface --model bert-base-uncased --input_shape "(1, 512)" --device CPU
+python aot_optimize_and_infer.py --export --suite timm --model vgg16 --input_shape [1, 3, 224, 224] --device CPU --validate --dataset /path/to/dataset
 ```
-#### Export and validate TIMM Resnet50d model for the CPU
+
+##### Export, quantize and validate TIMM Resnet50d model for the CPU
 ```bash
-python aot_openvino_compiler.py --suite timm --model vgg16 --input_shape [1, 3, 224, 224] --device CPU --validate --dataset /path/to/dataset
+python aot_optimize_and_infer.py --export --suite timm --model vgg16 --input_shape [1, 3, 224, 224] --device CPU --validate --dataset /path/to/dataset --quantize
 ```
 
-#### Export, quantize and validate TIMM Resnet50d model for the CPU
+##### Execute Inference with Torchvision Inception V3 model for the CPU
 ```bash
-python aot_openvino_compiler.py --suite timm --model vgg16 --input_shape [1, 3, 224, 224] --device CPU --validate --dataset /path/to/dataset --quantize
+python aot_optimize_and_infer.py --suite torchvision --model inception_v3 --infer --warmup_iter 10 --num_iter 100 --input_shape "(1, 3, 256, 256)" --device CPU
 ```
 
 ### **Notes**
@@ -162,72 +183,3 @@ Run inference with a given model for 10 iterations:
     --model_path=model.pte \
     --num_executions=10
 ```
-
-## Running Python Example with Pybinding:
-
-You can use the `export_and_infer_openvino.py` script to run models with the OpenVINO backend through the Python bindings.
-
-### **Usage**
-
-#### **Command Structure**
-```bash
-python export_and_infer_openvino.py <ARGUMENTS>
-```
-
-#### **Arguments**
-- **`--suite`** (required if `--model_path` argument is not used):
-  Specifies the model suite to use. Needs to be used with `--model` argument.
-  Supported values:
-  - `timm` (e.g., VGG16, ResNet50)
-  - `torchvision` (e.g., resnet18, mobilenet_v2)
-  - `huggingface` (e.g., bert-base-uncased). NB: Quantization and validation is not supported yet.
-
-- **`--model`** (required if `--model_path` argument is not used):
-  Name of the model to export. Needs to be used with `--suite` argument.
-  Examples:
-  - For `timm`: `vgg16`, `resnet50`
-  - For `torchvision`: `resnet18`, `mobilenet_v2`
-  - For `huggingface`: `bert-base-uncased`, `distilbert-base-uncased`
-
-- **`--model_path`** (required if `--suite` and `--model` arguments are not used):
-  Path to the saved model file. This argument allows you to load the compiled model from a file, instead of downloading it from the model suites using the `--suite` and `--model` arguments.   
-  Example: `<path to model foler>/resnet50_fp32.pte`
-
-- **`--input_shape`**(required for random inputs):
-  Input shape for the model. Provide this as a **list** or **tuple**.  
-  Examples:
-  - `[1, 3, 224, 224]` (Zsh users: wrap in quotes)
-  - `(1, 3, 224, 224)`
-
- - **`--input_tensor_path`**(optional):
-   Path to the raw input tensor file. If this argument is not provided, a random input tensor will be generated with the input shape provided with `--input_shape` argument.  
-  Example: `<path to the input tensor foler>/input_tensor.pt`
-
- - **`--output_tensor_path`**(optional):
-   Path to the file where the output raw tensor will be saved.  
-  Example: `<path to the output tensor foler>/output_tensor.pt`
-
-- **`--device`** (optional)
-  Target device for the compiled model. Default is `CPU`.  
-  Examples: `CPU`, `GPU`
-
-- **`--num_iter`** (optional)
-  Number of iterations to execute inference for evaluation. The default value is `1`.  
-  Examples: `100`, `1000`
-
-- **`--warmup_iter`** (optional)
-  Number of warmup iterations to execute inference before evaluation. The default value is `0`.  
-  Examples: `5`, `10`
-
-
-### **Examples**
-
-#### Execute Torchvision ResNet50 model for the GPU with Random Inputs
-```bash
-python export_and_infer_openvino.py --suite torchvision --model resnet50 --input_shape "(1, 3, 256, 256)" --device GPU
-```
-
-#### Run a Precompiled Model for the CPU Using an Existing Input Tensor File and Save the Output.
-```bash
-python export_and_infer_openvino.py --model_path /path/to/model/folder/resnet50_fp32.pte --input_tensor_file /path/to/input/folder/input.pt --output_tensor_file /path/to/output/folder/output.pt --device CPU
-```
diff --git a/examples/openvino/aot_optimize_and_infer.py b/examples/openvino/aot_optimize_and_infer.py
@@ -5,6 +5,7 @@
 # LICENSE file in the root directory of this source tree.
 
 import argparse
+import time
 
 import executorch
 
@@ -102,6 +103,54 @@ def load_calibration_dataset(
     return calibration_dataset
 
 
+def infer_model(
+    exec_prog: EdgeProgramManager,
+    input_shape,
+    num_iter: int,
+    warmup_iter: int,
+    input_path: str,
+    output_path: str,
+) -> float:
+    """
+    Executes inference and reports the average timing.
+
+    :param exec_prog: EdgeProgramManager of the lowered model
+    :param input_shape: The input shape for the model.
+    :param num_iter: The number of iterations to execute inference for timing.
+    :param warmup_iter: The number of iterations to execute inference for warmup before timing.
+    :param input_path: Path to the input tensor file to read the input for inference.
+    :param output_path: Path to the output tensor file to save the output of inference..
+    :return: The average inference timing.
+    """
+    # 1: Load model from buffer
+    executorch_module = _load_for_executorch_from_buffer(exec_prog.buffer)
+
+    # 2: Initialize inputs
+    if input_path:
+        inputs = (torch.load(input_path, weights_only=False),)
+    else:
+        inputs = (torch.randn(input_shape),)
+
+    # 3: Execute warmup
+    for _i in range(warmup_iter):
+        out = executorch_module.run_method("forward", inputs)
+
+    # 4: Execute inference and measure timing
+    time_total = 0.0
+    for _i in range(num_iter):
+        time_start = time.time()
+        out = executorch_module.run_method("forward", inputs)
+        time_end = time.time()
+        time_total += time_end - time_start
+
+    # 5: Save output tensor as raw tensor file
+    if output_path:
+        torch.save(out, output_path)
+
+    # 6: Return average inference timing
+    return time_total / float(num_iter)
+
+
 def validate_model(
     exec_prog: EdgeProgramManager, calibration_dataset: torch.utils.data.DataLoader
 ) -> float:
@@ -128,27 +177,42 @@ def validate_model(
     return accuracy_score(predictions, targets)
 
 
-def main(
+def main(  # noqa: C901
     suite: str,
     model_name: str,
     input_shape,
+    save_model: bool,
+    model_file_name: str,
     quantize: bool,
     validate: bool,
     dataset_path: str,
     device: str,
     batch_size: int,
+    infer: bool,
+    num_iter: int,
+    warmup_iter: int,
+    input_path: str,
+    output_path: str,
 ):
     """
     Main function to load, quantize, and validate a model.
 
     :param suite: The model suite to use (e.g., "timm", "torchvision", "huggingface").
     :param model_name: The name of the model to load.
     :param input_shape: The input shape for the model.
+    :param save_model: Whether to save the compiled model as a .pte file.
+    :param model_file_name: Custom file name to save the exported model.
     :param quantize: Whether to quantize the model.
     :param validate: Whether to validate the model.
     :param dataset_path: Path to the dataset for calibration/validation.
     :param device: The device to run the model on (e.g., "cpu", "gpu").
     :param batch_size: Batch size for dataset loading.
+    :param infer: Whether to execute inference and report timing.
+    :param num_iter: The number of iterations to execute inference for timing.
+    :param warmup_iter: The number of iterations to execute inference for warmup before timing.
+    :param input_path: Path to the input tensor file to read the input for inference.
+    :param output_path: Path to the output tensor file to save the output of inference..
+
     """
 
     # Load the selected model
@@ -214,10 +278,12 @@ def transform_fn(x):
     )
 
     # Serialize and save it to a file
-    model_file_name = f"{model_name}_{'int8' if quantize else 'fp32'}.pte"
-    with open(model_file_name, "wb") as file:
-        exec_prog.write_to_file(file)
-    print(f"Model exported and saved as {model_file_name} on {device}.")
+    if save_model:
+        if not model_file_name:
+            model_file_name = f"{model_name}_{'int8' if quantize else 'fp32'}.pte"
+        with open(model_file_name, "wb") as file:
+            exec_prog.write_to_file(file)
+        print(f"Model exported and saved as {model_file_name} on {device}.")
 
     if validate:
         if suite == "huggingface":
@@ -232,6 +298,13 @@ def transform_fn(x):
         acc_top1 = validate_model(exec_prog, calibration_dataset)
         print(f"acc@1: {acc_top1}")
 
+    if infer:
+        print("Start inference of the model:")
+        avg_time = infer_model(
+            exec_prog, input_shape, num_iter, warmup_iter, input_path, output_path
+        )
+        print(f"Average inference time: {avg_time}")
+
 
 if __name__ == "__main__":
     # Argument parser for dynamic inputs
@@ -258,6 +331,14 @@ def transform_fn(x):
         help="Batch size for the validation. Default batch_size == 1."
         " The dataset length must be evenly divisible by the batch size.",
     )
+    parser.add_argument(
+        "--export", action="store_true", help="Export the compiled model as .pte file."
+    )
+    parser.add_argument(
+        "--model_file_name",
+        type=str,
+        help="Custom file name to save the exported model.",
+    )
     parser.add_argument(
         "--quantize", action="store_true", help="Enable model quantization."
     )
@@ -266,6 +347,33 @@ def transform_fn(x):
         action="store_true",
         help="Enable model validation. --dataset argument is required for the validation.",
     )
+    parser.add_argument(
+        "--infer",
+        action="store_true",
+        help="Run inference and report timing.",
+    )
+    parser.add_argument(
+        "--num_iter",
+        type=int,
+        default=1,
+        help="The number of iterations to execute inference for timing.",
+    )
+    parser.add_argument(
+        "--warmup_iter",
+        type=int,
+        default=0,
+        help="The number of iterations to execute inference for warmup before timing.",
+    )
+    parser.add_argument(
+        "--input_tensor_path",
+        type=str,
+        help="Path to the input tensor file to read the input for inference.",
+    )
+    parser.add_argument(
+        "--output_tensor_path",
+        type=str,
+        help="Path to the output tensor file to save the output of inference.",
+    )
     parser.add_argument("--dataset", type=str, help="Path to the validation dataset.")
     parser.add_argument(
         "--device",
@@ -283,9 +391,16 @@ def transform_fn(x):
             args.suite,
             args.model,
             args.input_shape,
+            args.export,
+            args.model_file_name,
             args.quantize,
             args.validate,
             args.dataset,
             args.device,
             args.batch_size,
+            args.infer,
+            args.num_iter,
+            args.warmup_iter,
+            args.input_tensor_path,
+            args.output_tensor_path,
         )
diff --git a/examples/openvino/export_and_infer_openvino.py b/examples/openvino/export_and_infer_openvino.py