pytorch
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 6 additions & 10 deletions b/‎CONTRIBUTING.md‎
Lines changed: 6 additions & 10 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎backends/arm/test/models/test_llama.py‎
Lines changed: 27 additions & 1 deletion b/‎backends/arm/test/models/test_llama.py‎
Lines changed: 27 additions & 1 deletion
diff --git a/‎backends/arm/test/models/test_mobilenet_v3_arm.py‎
Lines changed: 3 additions & 3 deletions b/‎backends/arm/test/models/test_mobilenet_v3_arm.py‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎backends/arm/test/models/test_torch_functions.py‎
Lines changed: 2 additions & 0 deletions b/‎backends/arm/test/models/test_torch_functions.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎backends/arm/test/ops/test_sigmoid_16bit.py‎
Lines changed: 6 additions & 4 deletions b/‎backends/arm/test/ops/test_sigmoid_16bit.py‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎backends/arm/test/ops/test_sigmoid_32bit.py‎
Lines changed: 6 additions & 4 deletions b/‎backends/arm/test/ops/test_sigmoid_32bit.py‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎backends/cadence/aot/memory_planning.py‎
Lines changed: 9 additions & 13 deletions b/‎backends/cadence/aot/memory_planning.py‎
Lines changed: 9 additions & 13 deletions
diff --git a/‎backends/qualcomm/_passes/__init__.py‎
Lines changed: 2 additions & 0 deletions b/‎backends/qualcomm/_passes/__init__.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎backends/qualcomm/_passes/convert_upsample_bicubic2d.py‎
Lines changed: 27 additions & 0 deletions b/‎backends/qualcomm/_passes/convert_upsample_bicubic2d.py‎
Lines changed: 27 additions & 0 deletions
@@ -1,7 +1,6 @@
 Thank you for your interest in contributing to ExecuTorch! We want to make
 it easy to contribute to this project.
 
-&nbsp;
 
 ## Dev Install
 
@@ -91,7 +90,7 @@ executorch
 │   └── <a href="runtime/platform">platform</a> - Layer between architecture specific code and portable C++.
 ├── <a href="schema">schema</a> - ExecuTorch PTE file format flatbuffer schemas.
 ├── <a href="scripts">scripts</a> - Utility scripts for building libs, size management, dependency management, etc.
-├── <a href="shim">shim</a> - Compatibility layer between OSS and Internal builds.
+├── <a href="shim_et">shim_et</a> - Compatibility layer between OSS and Internal builds.
 ├── <a href="test">test</a> - Broad scoped end-to-end tests.
 ├── <a href="third-party">third-party</a> - Third-party dependencies.
 ├── <a href="tools">tools</a> - Tools for building ExecuTorch from source, for different built tools (CMake, Buck).
@@ -192,9 +191,6 @@ in the Github repo.
 
 ## Coding Style
 
-Goal: Encourage standards that make it easier to read, edit, maintain, and debug
-the ExecuTorch code.
-
 ### lintrunner
 
 We use [`lintrunner`](https://pypi.org/project/lintrunner/) to help make sure the
@@ -259,7 +255,7 @@ toolchains, and having access to relatively modern C++ features.
 
 #### C/C++ standard library usage
 
-**Restricted usage of the C++ standard library.**
+**Restricted usage of the C++ standard library**
 
 Rationale: ExecuTorch is intended to be portable to bare-metal systems that lack
 certain features, like dynamic memory, threading, and locking, required by parts
@@ -280,7 +276,7 @@ careful to also manually destroy objects initialized in this way.
 
 #### C++ language features
 
-**Exceptions: Do not use.**
+**Exceptions: Do not use**
 - Rationale: Exceptions are not widely supported on some classes of
   microcontrollers and DSPs, and they can significantly increase binary size.
 
@@ -289,12 +285,12 @@ must work with threading**
 - Rationale: The core runtime must work on systems that do not have threading
   support.
 
-**RTTI, dynamic_cast, and `<typeid>`: Do not use.**
+**RTTI, dynamic_cast, and `<typeid>`: Do not use**
 - Rationale: RTTI adds extra data to every virtual class. ExecuTorch doesn't
   have a strong need for `dynamic_cast` and friends, so it's better to reduce
   the binary size.
 
-**Templates and template metaprogramming: Be careful and avoid if possible.**
+**Templates and template metaprogramming: Be careful and avoid if possible**
 - Rationale: Most templating results in code generation, and is one of the most
   common sources of binary bloat. Some use of templates is fine (e.g. an
   `ArrayRef<T>`, or code that handles multiple `ScalarType` types), but for the
@@ -359,7 +355,7 @@ docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/
 for basics.
 
 1. Push your branch to your fork of `pytorch/executorch`. Most people do not
-  have permission to push a branch directoy to the upstream repo.
+  have permission to push a branch directory to the upstream repo.
 1. Create your PR
    - Use the `main` branch as the base.
    - Give the PR a clear and descriptive title. It will become the title of the
 
@@ -49,9 +49,9 @@ Key value propositions of ExecuTorch are:
 ## Getting Started
 To get started you can:
 
-- Visit the [Step by Step Tutorial](https://pytorch.org/executorch/main/index.html) on getting things running locally and deploy a model to a device
+- Visit the [Step by Step Tutorial](https://pytorch.org/executorch/main/index.html) to get things running locally and deploy a model to a device
 - Use this [Colab Notebook](https://pytorch.org/executorch/stable/getting-started-setup.html#quick-setup-colab-jupyter-notebook-prototype) to start playing around right away
-- Jump straight into LLMs use cases by following specific instructions for [Llama](./examples/models/llama/README.md) and [Llava](./examples/models/llava/README.md)
+- Jump straight into LLM use cases by following specific instructions for [Llama](./examples/models/llama/README.md) and [Llava](./examples/models/llava/README.md)
 
 ## Feedback and Engagement
 
 
@@ -102,7 +102,7 @@ def prepare_model(self):
     def test_llama_tosa_MI(self):
         llama_model, llama_inputs, llama_meta = self.prepare_model()
 
-        if llama_model is None and llama_inputs is None and llama_meta is None:
+        if llama_model is None or llama_inputs is None:
             pytest.skip("Missing model and/or input files")
 
         with torch.no_grad():
@@ -123,3 +123,29 @@ def test_llama_tosa_MI(self):
                     rtol=1.1,  # TODO: MLETORCH-825 decrease tolerance
                 )
             )
+
+    @pytest.mark.xfail(reason="KeyError: scalar_tensor_1 (MLETORCH-907)")
+    def test_llama_tosa_BI(self):
+        llama_model, llama_inputs, llama_meta = self.prepare_model()
+
+        if llama_model is None or llama_inputs is None:
+            pytest.skip("Missing model and/or input files")
+
+        with torch.no_grad():
+            (
+                ArmTester(
+                    llama_model,
+                    example_inputs=llama_inputs,
+                    compile_spec=common.get_tosa_compile_spec("TOSA-0.80+BI"),
+                    constant_methods=llama_meta,
+                )
+                .quantize()
+                .export()
+                .to_edge_transform_and_lower()
+                .to_executorch()
+                .run_method_and_compare_outputs(
+                    inputs=llama_inputs,
+                    atol=4.3,
+                    rtol=1.1,  # TODO: Tolerance needs to be updated after MLETORCH-907
+                )
+            )
@@ -46,7 +46,7 @@ def test_mv3_tosa_BI():
         aten_op=[],
         exir_op=[],
         use_to_edge_transform_and_lower=True,
-        atol=0.3,
+        atol=0.5,
         qtol=1,
     )
     pipeline.run()
@@ -63,7 +63,7 @@ def test_mv3_u55_BI():
         exir_ops=[],
         run_on_fvp=True,
         use_to_edge_transform_and_lower=True,
-        atol=0.3,
+        atol=0.5,
         qtol=1,
     )
     pipeline.run()
@@ -80,7 +80,7 @@ def test_mv3_u85_BI():
         exir_ops=[],
         run_on_fvp=True,
         use_to_edge_transform_and_lower=True,
-        atol=0.3,
+        atol=0.5,
         qtol=1,
     )
     pipeline.run()
@@ -101,6 +101,7 @@ def forward(self, *args):
         "Requires dynamic output shape.",
         "topk": "NotImplementedError: No registered serialization name for <class 'torch.return_types.topk'> found",
         "sort": "NotImplementedError: No registered serialization name for <class 'torch.return_types.sort'> found",
+        "norm": "An error occurred when running the 'KeepDimsFalseToSqueezePass' pass after the following passes:",
     },
 )
 def test_torch_fns_MI(test_data):
@@ -129,6 +130,7 @@ def test_torch_fns_MI(test_data):
         "topk": "NotImplementedError: No registered serialization name for <class 'torch.return_types.topk'> found",
         "sort": "NotImplementedError: No registered serialization name for <class 'torch.return_types.sort'> found",
         "t": "MLETORCH-855: Issue with Quantization folding.",
+        "norm": "An error occurred when running the 'KeepDimsFalseToSqueezePass' pass after the following passes:",
     },
     strict=False,
 )
 
@@ -81,7 +81,7 @@ def forward(self, x):
 
 
 @common.parametrize("test_data", test_data_suite)
-@pytest.mark.flaky(reruns=5)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 def test_sigmoid_tosa_BI(test_data):
     pipeline = TosaPipelineBI(
         Sigmoid(), (test_data(),), Sigmoid.aten_op, Sigmoid.exir_op
@@ -97,7 +97,7 @@ def test_sigmoid_tosa_BI(test_data):
         "ramp": "AssertionError: Output 0 does not match reference output. MLETORCH-787"
     },
 )
-@pytest.mark.flaky(reruns=5)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 def test_sigmoid_add_sigmoid_tosa_BI(test_data):
     pipeline = TosaPipelineBI(
         SigmoidAddSigmoid(), (test_data(),), Sigmoid.aten_op, Sigmoid.exir_op
@@ -110,6 +110,7 @@ def test_sigmoid_add_sigmoid_tosa_BI(test_data):
     "test_data",
     test_data_suite,
 )
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 def test_sigmoid_tosa_u55(test_data):
     pipeline = OpNotSupportedPipeline(
         Sigmoid(), (test_data(),), "TOSA-0.80+BI+u55", {Sigmoid.exir_op: 1}
@@ -122,6 +123,7 @@ def test_sigmoid_tosa_u55(test_data):
     "test_data",
     test_data_suite,
 )
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 def test_sigmoid_add_sigmoid_tosa_u55(test_data):
     pipeline = OpNotSupportedPipeline(
         SigmoidAddSigmoid(),
@@ -135,7 +137,7 @@ def test_sigmoid_add_sigmoid_tosa_u55(test_data):
 
 
 @common.parametrize("test_data", test_data_suite)
-@pytest.mark.flaky(reruns=5)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 @common.XfailIfNoCorstone320
 def test_sigmoid_tosa_u85(test_data):
     pipeline = EthosU85PipelineBI(
@@ -152,7 +154,7 @@ def test_sigmoid_tosa_u85(test_data):
         "ramp": "AssertionError: Output 0 does not match reference output.",
     },
 )
-@pytest.mark.flaky(reruns=5)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 @common.XfailIfNoCorstone320
 def test_sigmoid_add_sigmoid_tosa_u85(test_data):
     pipeline = EthosU85PipelineBI(
 
@@ -97,7 +97,7 @@ def forward(self, x):
 
 
 @common.parametrize("test_data", test_data_suite)
-@pytest.mark.flaky(reruns=5)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 def test_sigmoid_tosa_BI(test_data):
     pipeline = TosaPipelineBI(
         Sigmoid(),
@@ -110,7 +110,7 @@ def test_sigmoid_tosa_BI(test_data):
 
 
 @common.parametrize("test_data", test_data_suite)
-@pytest.mark.flaky(reruns=5)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 def test_sigmoid_add_sigmoid_tosa_BI(test_data):
     pipeline = TosaPipelineBI(
         SigmoidAddSigmoid(),
@@ -123,6 +123,7 @@ def test_sigmoid_add_sigmoid_tosa_BI(test_data):
 
 
 @common.parametrize("test_data", test_data_suite)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 def test_sigmoid_tosa_u55(test_data):
     pipeline = OpNotSupportedPipeline(
         Sigmoid(), (test_data(),), "TOSA-0.80+BI+u55", {Sigmoid.exir_op: 1}
@@ -132,6 +133,7 @@ def test_sigmoid_tosa_u55(test_data):
 
 
 @common.parametrize("test_data", test_data_suite)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 def test_sigmoid_add_sigmoid_tosa_u55(test_data):
     pipeline = OpNotSupportedPipeline(
         SigmoidAddSigmoid(),
@@ -145,7 +147,7 @@ def test_sigmoid_add_sigmoid_tosa_u55(test_data):
 
 
 @common.parametrize("test_data", test_data_suite)
-@pytest.mark.flaky(reruns=5)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 @common.XfailIfNoCorstone320
 def test_sigmoid_tosa_u85(test_data):
     pipeline = EthosU85PipelineBI(
@@ -162,7 +164,7 @@ def test_sigmoid_tosa_u85(test_data):
         "ramp": "AssertionError: Output 0 does not match reference output.",
     },
 )
-@pytest.mark.flaky(reruns=5)
+@pytest.mark.flaky(reruns=32)  # Flaky due to Vela bug: MLBEDSW-10642
 @common.XfailIfNoCorstone320
 def test_sigmoid_add_sigmoid_tosa_u85(test_data):
     pipeline = EthosU85PipelineBI(
 
@@ -12,7 +12,7 @@
 import math
 import typing
 from functools import partial
-from typing import Iterable, List, Optional, Tuple
+from typing import Iterable, List, Optional, Set, Tuple
 
 import torch
 from executorch.backends.cadence.aot.memory_constraints import (
@@ -73,11 +73,11 @@ def collect_specs_from_graph_module(
 # the fastest memory available
 # flake8: noqa 'position_based_greedy_with_hierarchy' is too complex (13)
 def position_based_greedy_with_hierarchy(
-    graph_module: torch.fx.GraphModule,
     alignment: int,
+    specs: Set[TensorSpec],
+    graph_module: torch.fx.GraphModule,
     graph_signature: ExportGraphSignature,
-    alloc_graph_input: bool,
-    alloc_graph_output: bool,
+    extra_padding: int = 0,
     *,
     memory_config: MemoryConfig,
     mem_constraints: MemConstraints,
@@ -119,9 +119,7 @@ def memory_available(spec: TensorSpec) -> bool:
 
     # Iterate over all the specs in sorted order
     for spec in sorted(
-        collect_specs_from_graph_module(
-            graph_module, graph_signature, alloc_graph_input, alloc_graph_output
-        ),
+        specs,
         key=lambda spec: spec.allocated_memory,
         reverse=True,
     ):
@@ -167,11 +165,11 @@ def memory_available(spec: TensorSpec) -> bool:
 
 # Greedy tensor placement with the heuristics from arxiv.org/pdf/2001.03288.pdf
 def greedy_by_size_for_offset_calculation_with_hierarchy(
-    graph_module: torch.fx.GraphModule,
     alignment: int,
+    specs: Set[TensorSpec],
+    graph_module: torch.fx.GraphModule,
     graph_signature: ExportGraphSignature,
-    alloc_graph_input: bool,
-    alloc_graph_output: bool,
+    extra_padding: int = 0,
     *,
     memory_config: MemoryConfig,
     mem_constraints: MemConstraints,
@@ -199,9 +197,7 @@ def greedy_by_size_for_offset_calculation_with_hierarchy(
 
     # Iterate over all the specs in sorted order
     for spec in sorted(
-        collect_specs_from_graph_module(
-            graph_module, graph_signature, alloc_graph_input, alloc_graph_output
-        ),
+        specs,
         key=lambda spec: spec.allocated_memory,
         reverse=True,
     ):
 
@@ -9,6 +9,7 @@
 from .annotate_unbind import AnnotateUnbind
 from .convert_bmm_to_matmul import ConvertBmmToMatmul
 from .convert_conv1d_to_conv2d import ConvertConv1dToConv2d
+from .convert_upsample_bicubic2d import ConvertUpsampleBicubicWithBilinear
 from .decompose_any import DecomposeAny
 from .decompose_einsum import DecomposeEinsum
 from .decompose_expm1 import DecomposeExpM1
@@ -40,6 +41,7 @@
     ConvertBmmToMatmul,
     ConvertConv1dToConv2d,
     DecomposeAny,
+    ConvertUpsampleBicubicWithBilinear,
     DecomposeEinsum,
     DecomposeExpM1,
     DecomposeLinalgVectorNorm,
 
@@ -0,0 +1,27 @@
+# Copyright (c) Qualcomm Innovation Center, Inc.
+# All rights reserved
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+from executorch.exir.dialects._ops import ops as exir_ops
+from executorch.exir.pass_base import ExportPass
+
+
+class ConvertUpsampleBicubicWithBilinear(ExportPass):
+    """
+    Qnn does not support bicubic interpolation, so we need to convert it to bilinear.
+    This pass will convert bicubic interpolation to bilinear interpolation.
+    """
+
+    bicubic_op_targets = {
+        exir_ops.edge.aten.upsample_bicubic2d.vec,
+    }
+    upsample_bilinear_op = exir_ops.edge.aten.upsample_bilinear2d.default
+
+    def __init__(self):
+        super(ConvertUpsampleBicubicWithBilinear, self).__init__()
+
+    def call_operator(self, op, args, kwargs, meta):
+        if op not in self.bicubic_op_targets:
+            return super().call_operator(op, args, kwargs, meta)
+        return super().call_operator(self.upsample_bilinear_op, args[:-1], kwargs, meta)