Skip to content

Conversation

@robert-kalmar
Copy link
Collaborator

@robert-kalmar robert-kalmar commented May 14, 2025

Summary

This PR add a AoT example with the eIQ Neutron Backend. The Backend is demonstrated on tiny CNN model named CifarNet, trained on Cifar10 dataset, which is part of the PR.

Test plan

Manual testing, executing the example based on steps in the Readme.md and validating the PTE on i.MX RT700 platform with the Neutron Backend runtime.

Resolves #10898

cc @digantdesai @JakeStevens , @JakeStevens , @skywall , @jirioc

@pytorch-bot
Copy link

pytorch-bot bot commented May 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10871

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit be74e00 with merge base 70ea0dd (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 14, 2025
@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from e5ed112 to 46c2a58 Compare May 14, 2025 11:56
@robert-kalmar
Copy link
Collaborator Author

robert-kalmar commented May 14, 2025

@pytorchbot label "module: nxp" "release notes: nxp"

@pytorch-bot
Copy link

pytorch-bot bot commented May 14, 2025

Didn't find following labels among repository labels: ,,label

@pytorch-bot pytorch-bot bot added module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate labels May 14, 2025
action="store_true",
required=False,
default=False,
help="Flag for producing ArmBackend delegated model",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Flag for producing ArmBackend delegated model",
help="Flag for producing NeutronBackend delegated model",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model, example_inputs, strict=True
)

# TODO: Add Neutron ATen Passes, once https://github.com/pytorch/executorch/pull/10579 is merged
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: file a task so we can track and not lose this

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10898

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10579 is now merged!

"_portable_lib.cpython* using --portable_lib CLI options. \n"
"This is required for running quantized models with unquantized input."
)
sys.exit(-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you either: (1) just not sys.exit entirely and let it fail loudly later when it will hit the runtime exception or (2) add a CLI arg to allow skipping this part-- and the part below for the torch.loads

In internal infra, these libraries are loaded a slightly different way and I do not actually pass the .so on command line, and it is not loaded a few lines below.

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Ok, so reverted back to our original solution. There is only a warning raised and normally fails later when exporting to ExecuTorch Program:

 # 6. Export to ExecuTorch program
    try:
        exec_prog = edge_program.to_executorch(
            config=ExecutorchBackendConfig(extract_delegate_segments=False)
        )
    except RuntimeError as e:
        if "Missing out variants" in str(e.args[0]):
            raise RuntimeError(
                e.args[0]
                + ".\nThis likely due to an external so library not being loaded. Supply a path to it with the "
                "--portable_lib flag."
            ).with_traceback(e.__traceback__) from None
        else:
            raise e

x = self.conv3(x)
x = self.pool2(x)

# The output of the previous MaxPool has shape [batch, 64, 4, 4] ([batch, 4, 4, 64] in TFLite). When running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The output of the previous MaxPool has shape [batch, 64, 4, 4] ([batch, 4, 4, 64] in TFLite). When running
# The output of the previous MaxPool has shape [batch, 64, 4, 4] ([batch, 4, 4, 64] in Neutron IR). When running

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x = self.pool2(x)

# The output of the previous MaxPool has shape [batch, 64, 4, 4] ([batch, 4, 4, 64] in TFLite). When running
# inference of the `FullyConnected`, TFlite will automatically collapse the channels and spatial dimensions and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# inference of the `FullyConnected`, TFlite will automatically collapse the channels and spatial dimensions and
# inference of the `FullyConnected`, Neutron IR will automatically collapse the channels and spatial dimensions and

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser.add_argument(
"-p",
"--portable_lib",
required=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be required because portable library is loaded only when --quantize=True.

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Thanks, fixed in latest push.


# For quantization we need to build the quantized_ops_aot_lib.so and _portable_lib.*.so
# Use this CMake options
# -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documentation up to date? Is portable lib built just by specifying these two flags?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantized_ops_aot_lib links to portable_lib

$ ldd ./venv3.10/lib/python3.10/site-packages/executorch/kernels/quantized/libquantized_ops_aot_lib.so
        _portable_lib.cpython-310d-x86_64-linux-gnu.so => not found
       ....

For some reason we must load the portable_lib manually prior to libquantized_ops_aot_lib.so, the dlopen does not not find is by its own.

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


FYI @skywall , we do not need any custom library loading for the quantized kernels out variants. There are already a python packages for this:

import executorch.extension.pybindings.portable_lib
import executorch.kernels.quantized 

Thanks to @digantdesai for the review items which helped me to find it out.

@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from 46c2a58 to 2397cb0 Compare May 15, 2025 11:10
2. After building the ExecuTorch you shall have the `libquantized_ops_aot_lib.so` and `_portable_lib.<python_version>.so` located in the `pip_out/lib` folder. We will need this library when generating the quantized cifarnet ExecuTorch model. So as first step we will find it:
```commandline
$ find . -name "libquantized_ops_aot_lib.so"
./pip-out/lib.linux-x86_64-cpython-310-pydebug/executorch/kernels/quantized/libquantized_ops_aot_lib.so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I added optimized cortex-M q/dq int8 op if you want to use that , it is still quite early days for that lib

./pip-out/lib.linux-x86_64-cpython-310-pydebug/executorch/kernels/quantized/libquantized_ops_aot_lib.so

$ find . -name "_portable_lib.cpython-310d-x86_64-linux-gnu.so"
./pip-out/lib.linux-x86_64-cpython-310-pydebug/executorch/extension/pybindings/_portable_lib.cpython-310d-x86_64-linux-gnu.so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this using selective build?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean.

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand where you are heading. We needed the quantized_aot_lib to get the out variants for quantize/dequantize_per_tensor operators.
I find there are already python bindings and modules to solve it:

import executorch.extension.pybindings.portable_lib
import executorch.kernels.quantized 

Comment on lines 255 to 256
torch.ops.load_library(args.portable_lib)
torch.ops.load_library(args.so_library)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need these? just include the python module perhaps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right (obviously) , we don't. Importing the python modules instead.

import executorch.extension.pybindings.portable_lib
import executorch.kernels.quantized 

Thanks for the finding, it helped me to locate these python modules.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks.

@digantdesai
Copy link
Contributor

Ready to merge? Fix linter please?

@robert-kalmar
Copy link
Collaborator Author

Ready to merge? Fix linter please?

Not yet, updating quantizer to the recent changes : moving torchao to torch.ao

@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch 2 times, most recently from 3018aea to 2941a74 Compare May 23, 2025 11:33
@robert-kalmar
Copy link
Collaborator Author

Linting error - fixed.
Quantizer invocation (using torchao instead of torch.ao) in the aot_neutron_example to align with updates in #10294 -fixed
Importing quantized operators instead of loading the *.so library - fixed

Now it is ready to merge.

@robert-kalmar
Copy link
Collaborator Author

3 checks failed. All with missing the "llm" preset. They were added in a later commit (c256723#diff-fc10486ef573a9c92fe4a135b8a1b20157154af6e83dacfd1ea046bda7814c84). I guess, those failures are unrelated with changes in the PR.

Although I wonder, why those tests got even triggered, as they are not in the .github/workflows of this codebase.

@digantdesai
Copy link
Contributor

Let's re-merge the CI PR, and then we can merge this, so we have some confidence in this and know we won't be regressing. Thanks.

@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from 2941a74 to c7d4b49 Compare June 10, 2025 09:10
Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Is the setup.sh empty for a reason?

@robert-kalmar
Copy link
Collaborator Author

robert-kalmar commented Jun 10, 2025

Looks good. Is the setup.sh empty for a reason?

It is not empty, just it content has not changed - https://github.com/pytorch/executorch/blob/2941a74be7f4d49198087d3983d591911c614260/examples/nxp/setup.sh
The change is the file mode - adding the execute bit chmod +x .

The WebUI is misleading here. By "empty file" it evidently means empty diff 🙃

@robert-kalmar robert-kalmar marked this pull request as draft June 18, 2025 12:10
@robert-kalmar
Copy link
Collaborator Author

Converting to draft unless the NXP Backend CI is back (#11756)

@StrycekSimon StrycekSimon force-pushed the upstream/release-mcux-25.03-full/aot-example branch from c7d4b49 to 7d1fa7f Compare June 30, 2025 14:41
@robert-kalmar robert-kalmar marked this pull request as ready for review July 8, 2025 09:12
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 8, 2025

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/trunk label Jul 8, 2025
@robert-kalmar robert-kalmar marked this pull request as draft July 8, 2025 09:59
@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from 7d1fa7f to f129aec Compare July 11, 2025 12:58
@robert-kalmar robert-kalmar marked this pull request as ready for review July 11, 2025 13:01
@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from f129aec to c690d05 Compare July 11, 2025 14:38
@@ -0,0 +1,19 @@
# PyTorch Model Delegation to Neutron Backend

In this guideline we will show how to use the ExecuTorch AoT part to convert a PyTorch model to ExecuTorch format and delegate the model computation to eIQ Neutron NPU using the eIQ Neutron Backend.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
In this guideline we will show how to use the ExecuTorch AoT part to convert a PyTorch model to ExecuTorch format and delegate the model computation to eIQ Neutron NPU using the eIQ Neutron Backend.
In this guide we will show how to use the ExecuTorch AoT flow to convert a PyTorch model to ExecuTorch format and delegate the model computation to eIQ Neutron NPU using the eIQ Neutron Backend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--delegate --neutron_converter_flavor SDK_25_03 -m cifar10
```

3. It will generate you `cifar10_nxp_delegate.pte` file which can be used with the MXUXpresso SDK `cifarnet_example` project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link for the SDK example?

Suggested change
3. It will generate you `cifar10_nxp_delegate.pte` file which can be used with the MXUXpresso SDK `cifarnet_example` project.
3. It will generate you `cifar10_nxp_delegate.pte` file which can be used with the MCUXpresso SDK `cifarnet_example` project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Running trunk tests.

@robert-kalmar
Copy link
Collaborator Author

Will apply your comments and then rebase. The unit tests failure tracks back to this commit: https://hud.pytorch.org/pytorch/executorch/commit/3419b46912f5c7f675879669e96f64ba11ba4129

What was resolved/mitigated by https://hud.pytorch.org/pytorch/executorch/commit/154065958093e1fcf61c1d29b4a403bff6dc7f47 .

@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from c690d05 to be74e00 Compare July 15, 2025 07:34
@robert-kalmar
Copy link
Collaborator Author

Comment applied. CI passing. The pull / test-eval_llama-mmlu-linux / linux-job (pull_request) job failed with infrastructure problem - could not find the hails/mmlu_no_train dataset and HuggingFace and not present in the cache. It is unrelated to this PR.
CC @digantdesai , @JakeStevens

@JakeStevens JakeStevens merged commit 00491fd into pytorch:main Jul 15, 2025
205 of 207 checks passed
@robert-kalmar robert-kalmar deleted the upstream/release-mcux-25.03-full/aot-example branch July 15, 2025 13:44
lucylq pushed a commit that referenced this pull request Jul 17, 2025
### Summary
This PR add a AoT example with the eIQ Neutron Backend. The Backend is
demonstrated on tiny CNN model named CifarNet, trained on Cifar10
dataset, which is part of the PR.

### Test plan
Manual testing, executing the example based on steps in the Readme.md
and validating the PTE on i.MX RT700 platform with the Neutron Backend
runtime.

Resolves #10898 

cc @digantdesai @JakeStevens , @JakeStevens , @skywall , @jirioc

Co-authored-by: Martin Pavella <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NeutronBackend: Add Neutron ATen Passes to Neutron aot example

5 participants