Arm Backend: Add support for ELU.default operator #12996

agrima1304 · 2025-07-30T15:18:54Z

Decomposes elu into other operators/ lookup table for MI/ BI case.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

pytorch-bot · 2025-07-30T15:18:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12996

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

agrima1304 · 2025-07-30T15:20:40Z

@pytorchbot label "partner: arm"

agrima1304 · 2025-07-30T15:21:10Z

@pytorchbot label "release notes: arm"

agrima1304 · 2025-07-30T15:21:20Z

@pytorchbot label "ciflow/trunk"

pytorch-bot · 2025-07-30T15:21:26Z

To add these label(s) (ciflow/trunk) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

Sebastian-Larsson

The CI failure for Cortex-m is unrelated to this patch. Approved

digantdesai · 2025-07-31T10:46:55Z

backends/arm/_passes/convert_elu_params.py

+    It has been set to 2 as the outputs seem to stay the same regardless of what
+    the value of input_scale is, as long as that value is not 1.


I don't understand this.

The default value of input_scale is 1.0, however using the default value resulted in a type error. When passing 1 as an int, it was overridden by the default value (since both values are 1 and therefore equivalent ). So input_scale had to be changed to an int that is not 1.

digantdesai

Thanks!

backends/arm/test/ops/test_elu.py

backends/arm/_passes/decompose_elu_pass.py

zingo · 2025-07-31T15:21:26Z

This will unfortunately need a rebase after the TOSA 0.80.1 removal :(

Differential Revision: D80067087 Pull Request resolved: pytorch#13320

Differential Revision: D79828275 Pull Request resolved: pytorch#13202

Add tests for the LSTM module. This is done in the context of pytorch#12898.

Add tests for avgpooling operators. This is done in the context of pytorch#12898.

Add tests for maxpooling operators. This is done in the context of pytorch#12898.

Add tests for adaptive avgpooling operators. This is done in the context of pytorch#12898.

Add tests for adaptive maxpooling operators. This is done in the context of pytorch#12898.

Add a tester class implementation for Qualcomm and register the test flow with the backend tester. Note that QNN pybindings are planned but not yet functional.

Add some initial CSV report generation, detailing results and parameters for each individual test. Delegation statistics and such will come next. I've also added a basic test for the report generation, which I will expand upon in this stack. Here's some sample output from running add tests for XNNPACK: ``` Test ID,Test Case,Backend,Flow,Result,Dtype test_add_dtype_float32_xnnpack,test_add_dtype,xnnpack,xnnpack,Success (Delegated),torch.float32 test_add_dtype_float32_xnnpack_static_int8,test_add_dtype,xnnpack,xnnpack_static_int8,Success (Delegated),torch.float32 test_add_f32_alpha_xnnpack,test_add_f32_alpha,xnnpack,xnnpack,Fail (Quantize), test_add_f32_alpha_xnnpack_static_int8,test_add_f32_alpha,xnnpack,xnnpack_static_int8,Fail (Quantize), test_add_f32_bcast_first_xnnpack,test_add_f32_bcast_first,xnnpack,xnnpack,Success (Delegated), test_add_f32_bcast_first_xnnpack_static_int8,test_add_f32_bcast_first,xnnpack,xnnpack_static_int8,Success (Delegated), test_add_f32_bcast_second_xnnpack,test_add_f32_bcast_second,xnnpack,xnnpack,Success (Delegated), test_add_f32_bcast_second_xnnpack_static_int8,test_add_f32_bcast_second,xnnpack,xnnpack_static_int8,Success (Delegated), test_add_f32_bcast_unary_xnnpack,test_add_f32_bcast_unary,xnnpack,xnnpack,Success (Delegated), test_add_f32_bcast_unary_xnnpack_static_int8,test_add_f32_bcast_unary,xnnpack,xnnpack_static_int8,Success (Delegated), ```

### Summary Overhaul the "Building from Source" doc page. The primarily intent of these changes is to document CMake presets and the various build options that we expose. However, I also did a pass on the existing contents of the file to improve formatting and clarity. I've re-organized the page to clearly delineate environment setup, python install, and native build. It should flow better and be easier to read. ### Test plan I have built the docs locally to inspect the contents for formatting and correctness. Preview page: https://docs-preview.pytorch.org/pytorch/executorch/13210/using-executorch-building-from-source.html Live page (for comparison): https://docs.pytorch.org/executorch/0.7/using-executorch-building-from-source.html

Differential Revision: D79268134 Pull Request resolved: pytorch#13004

Report various error statistics for the test outputs, including SQNR, mean absolute error (MAE), and L2 norm. These are saved in the detail report per test case. As an example, here is the output from Core ML running MobileNet V2 (roughly formatted from csv -> sheets -> markdown): ``` Output 0 Error Max Output 0 Error MAE Output 0 Error MSD Output 0 Error L2 Output 0 SQNR 0.0005887411535 0.0001199183663 2.32E-06 0.004750485188 41.28595734 ```

### Summary Turning on the `EXECUTORCH_ENABLE_EVENT_TRACER` option will enable event tracing in the Wasm module API. The results can be obtained with the `etdump()` method. ### Test plan Added two tests depending on whether `EXECUTORCH_ENABLE_EVENT_TRACER` is turned on or not. Added the `--enable-etdump` option to `scripts/build_wasm_tests.sh` which turns on the above option. Added configurations to the `unittest-wasm-bindings` CI test to run with and without `--enable-etdump`.

Report total number of delegated and undelegated nodes and breakdown by operator count. Example from CoreML add: Test ID | Test Case | Backend | Delegated Nodes | Undelegated Nodes | Delegated Ops | Undelegated Ops -- | -- | -- | -- | -- | -- | -- test_add_dtype_float32_coreml | test_add_dtype | coreml | 1 | 0 | {'aten::add.Tensor': 1} | {} test_add_dtype_float32_coreml_static_int8 | test_add_dtype | coreml | 7 | 0 | {'aten::add.Tensor': 1, 'quantized_decomposed::dequantize_per_tensor': 3, 'quantized_decomposed::quantize_per_tensor': 3} | {}

Job is failing on trunk. Temporarily disabling while I resolve it.

@digantdesai

…#13574) ### Summary This PR replaces an IR optimization that removes dead code from the model, by an equivalent executorch call. ### Test plan Unit test provided in `backends/nxp/tests/test_removing_dead_code.py`. cc @digantdesai @JakeStevens @robert-kalmar

Differential Revision: D80772740 Pull Request resolved: pytorch#13592

Differential Revision: D80957382 Pull Request resolved: pytorch#13659

…ng to the released memory Differential Revision: D80754181 Pull Request resolved: pytorch#13590

Differential Revision: D80906791 Pull Request resolved: pytorch#13632

Differential Revision: D80914321 Pull Request resolved: pytorch#13633

Differential Revision: D80881025 Pull Request resolved: pytorch#13623

@digantdesai

…ch#13630) Building the example application using cmake is now straight forward enough to not need any helper scripts. Additionally simplify the the example + path setup in the executor runner cmake script and make the default path match the example. cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 Signed-off-by: Adrian Lundell <[email protected]>

### Summary Add MobileNetV2 model as example and for integration testing ### Test plan Support for testing full conversion on this model is included in `run_aot_example.sh`. --------- Co-authored-by: Lukas Sztefek <[email protected]>

Signed-off-by: Agrima Khare <[email protected]> Change-Id: I032414e7454d5e2cada05b788e9eed0f7b2dc97c

agrima1304 requested a review from digantdesai as a code owner July 30, 2025 15:18

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 30, 2025

pytorch-bot bot added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Jul 30, 2025

pytorch-bot bot added the release notes: arm Changes to the ARM backend delegate label Jul 30, 2025

Sebastian-Larsson added the ciflow/trunk label Jul 31, 2025

Sebastian-Larsson approved these changes Jul 31, 2025

View reviewed changes

digantdesai reviewed Jul 31, 2025

View reviewed changes

digantdesai approved these changes Jul 31, 2025

View reviewed changes

backends/arm/test/ops/test_elu.py Show resolved Hide resolved

backends/arm/_passes/decompose_elu_pass.py Outdated Show resolved Hide resolved

lucylq and others added 16 commits August 26, 2025 14:48

Import Fix

68df797

Differential Revision: D80067087 Pull Request resolved: pytorch#13320

Fix pyre errors

f478800

Differential Revision: D79828275 Pull Request resolved: pytorch#13202

[Backend Tester] Add LSTM tests (pytorch#13238)

5ab8f52

Add tests for the LSTM module. This is done in the context of pytorch#12898.

[Backend Tester] Add avgpool tests (pytorch#13239)

6157370

Add tests for avgpooling operators. This is done in the context of pytorch#12898.

[Backend Tester] Add maxpool tests (pytorch#13240)

fa059f2

Add tests for maxpooling operators. This is done in the context of pytorch#12898.

[Backend Tester] Add adaptive avgpool tests (pytorch#13241)

eeaaad6

Add tests for adaptive avgpooling operators. This is done in the context of pytorch#12898.

[Backend Tester] Add adaptive maxpool tests (pytorch#13242)

04689e3

Add tests for adaptive maxpooling operators. This is done in the context of pytorch#12898.

[Backend Tester] Add Qualcomm tester and register flow (pytorch#12739)

732d9de

Add a tester class implementation for Qualcomm and register the test flow with the backend tester. Note that QNN pybindings are planned but not yet functional.

Move to Span<EValue*> instead of EValue** in delegate interface

da3a5e5

Differential Revision: D79268134 Pull Request resolved: pytorch#13004

Update XNNPACK to 3131afe (pytorch#13234)

f86285c

GregoryComer and others added 13 commits August 26, 2025 14:48

Temporarily disable windows preset build in CI (pytorch#13669)

b78e768

Job is failing on trunk. Temporarily disabling while I resolve it.

Increase binary size limit by 8 bytes (pytorch#13671)

68a9a42

Inline requantize kernels

80d1407

Differential Revision: D80772740 Pull Request resolved: pytorch#13592

Smollm targets

aae7baa

Differential Revision: D80957382 Pull Request resolved: pytorch#13659

Override unload_method in training_module to erase the tensors pointi…

01ca904

…ng to the released memory Differential Revision: D80754181 Pull Request resolved: pytorch#13590

Disable mm + add -> addmm fusion if added tensor rank >2

4df836d

Differential Revision: D80906791 Pull Request resolved: pytorch#13632

Fix bad optimized kernel for add.

9d6a7f2

Differential Revision: D80914321 Pull Request resolved: pytorch#13633

Allow zero-element inputs for method.

e1cd63e

Differential Revision: D80881025 Pull Request resolved: pytorch#13623

Arm Backend: Add support for ELU.default operator

7bb115b

Signed-off-by: Agrima Khare <[email protected]> Change-Id: I032414e7454d5e2cada05b788e9eed0f7b2dc97c

Arm Backend: Add support for ELU.default operator

c9cbad7

Signed-off-by: Agrima Khare <[email protected]> Change-Id: I032414e7454d5e2cada05b788e9eed0f7b2dc97c

agrima1304 requested review from Gasoonjia, GregoryComer, JacobSzwejbka, SS-JIA, cccclai, jackzhxng, kimishpatel, kirklandsign, larryliu0820, lucylq, manuelcandales, mergennachin, metascroy, shoumikhin and swolchok as code owners August 26, 2025 13:53

pytorch-bot bot removed the ciflow/trunk label Aug 26, 2025

agrima1304 closed this Aug 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Arm Backend: Add support for ELU.default operator #12996

Arm Backend: Add support for ELU.default operator #12996

Uh oh!

agrima1304 commented Jul 30, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 30, 2025 •

edited

Loading

Uh oh!

agrima1304 commented Jul 30, 2025

Uh oh!

agrima1304 commented Jul 30, 2025

Uh oh!

agrima1304 commented Jul 30, 2025

Uh oh!

pytorch-bot bot commented Jul 30, 2025

Uh oh!

Sebastian-Larsson left a comment

Uh oh!

digantdesai Jul 31, 2025

Uh oh!

agrima1304 Aug 1, 2025

Uh oh!

digantdesai left a comment

Uh oh!

Uh oh!

Uh oh!

zingo commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

		It has been set to 2 as the outputs seem to stay the same regardless of what
		the value of input_scale is, as long as that value is not 1.

Arm Backend: Add support for ELU.default operator #12996

Arm Backend: Add support for ELU.default operator #12996

Uh oh!

Conversation

agrima1304 commented Jul 30, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12996

Uh oh!

agrima1304 commented Jul 30, 2025

Uh oh!

agrima1304 commented Jul 30, 2025

Uh oh!

agrima1304 commented Jul 30, 2025

Uh oh!

pytorch-bot bot commented Jul 30, 2025

Uh oh!

Sebastian-Larsson left a comment

Choose a reason for hiding this comment

Uh oh!

digantdesai Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

agrima1304 Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zingo commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

agrima1304 commented Jul 30, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 30, 2025 •

edited

Loading