-
Notifications
You must be signed in to change notification settings - Fork 749
Arm Backend: Add support for ELU.default operator #12996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm Backend: Add support for ELU.default operator #12996
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12996
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "partner: arm" |
|
@pytorchbot label "release notes: arm" |
|
@pytorchbot label "ciflow/trunk" |
|
To add these label(s) (ciflow/trunk) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page). This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
Sebastian-Larsson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI failure for Cortex-m is unrelated to this patch. Approved
| It has been set to 2 as the outputs seem to stay the same regardless of what | ||
| the value of input_scale is, as long as that value is not 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of input_scale is 1.0, however using the default value resulted in a type error. When passing 1 as an int, it was overridden by the default value (since both values are 1 and therefore equivalent ). So input_scale had to be changed to an int that is not 1.
digantdesai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
This will unfortunately need a rebase after the TOSA 0.80.1 removal :( |
Differential Revision: D80067087 Pull Request resolved: pytorch#13320
Differential Revision: D79828275 Pull Request resolved: pytorch#13202
Add tests for the LSTM module. This is done in the context of pytorch#12898.
Add tests for avgpooling operators. This is done in the context of pytorch#12898.
Add tests for maxpooling operators. This is done in the context of pytorch#12898.
Add tests for adaptive avgpooling operators. This is done in the context of pytorch#12898.
Add tests for adaptive maxpooling operators. This is done in the context of pytorch#12898.
Add a tester class implementation for Qualcomm and register the test flow with the backend tester. Note that QNN pybindings are planned but not yet functional.
Add some initial CSV report generation, detailing results and parameters for each individual test. Delegation statistics and such will come next. I've also added a basic test for the report generation, which I will expand upon in this stack. Here's some sample output from running add tests for XNNPACK: ``` Test ID,Test Case,Backend,Flow,Result,Dtype test_add_dtype_float32_xnnpack,test_add_dtype,xnnpack,xnnpack,Success (Delegated),torch.float32 test_add_dtype_float32_xnnpack_static_int8,test_add_dtype,xnnpack,xnnpack_static_int8,Success (Delegated),torch.float32 test_add_f32_alpha_xnnpack,test_add_f32_alpha,xnnpack,xnnpack,Fail (Quantize), test_add_f32_alpha_xnnpack_static_int8,test_add_f32_alpha,xnnpack,xnnpack_static_int8,Fail (Quantize), test_add_f32_bcast_first_xnnpack,test_add_f32_bcast_first,xnnpack,xnnpack,Success (Delegated), test_add_f32_bcast_first_xnnpack_static_int8,test_add_f32_bcast_first,xnnpack,xnnpack_static_int8,Success (Delegated), test_add_f32_bcast_second_xnnpack,test_add_f32_bcast_second,xnnpack,xnnpack,Success (Delegated), test_add_f32_bcast_second_xnnpack_static_int8,test_add_f32_bcast_second,xnnpack,xnnpack_static_int8,Success (Delegated), test_add_f32_bcast_unary_xnnpack,test_add_f32_bcast_unary,xnnpack,xnnpack,Success (Delegated), test_add_f32_bcast_unary_xnnpack_static_int8,test_add_f32_bcast_unary,xnnpack,xnnpack_static_int8,Success (Delegated), ```
### Summary Overhaul the "Building from Source" doc page. The primarily intent of these changes is to document CMake presets and the various build options that we expose. However, I also did a pass on the existing contents of the file to improve formatting and clarity. I've re-organized the page to clearly delineate environment setup, python install, and native build. It should flow better and be easier to read. ### Test plan I have built the docs locally to inspect the contents for formatting and correctness. Preview page: https://docs-preview.pytorch.org/pytorch/executorch/13210/using-executorch-building-from-source.html Live page (for comparison): https://docs.pytorch.org/executorch/0.7/using-executorch-building-from-source.html
Differential Revision: D79268134 Pull Request resolved: pytorch#13004
Report various error statistics for the test outputs, including SQNR, mean absolute error (MAE), and L2 norm. These are saved in the detail report per test case. As an example, here is the output from Core ML running MobileNet V2 (roughly formatted from csv -> sheets -> markdown): ``` Output 0 Error Max Output 0 Error MAE Output 0 Error MSD Output 0 Error L2 Output 0 SQNR 0.0005887411535 0.0001199183663 2.32E-06 0.004750485188 41.28595734 ```
Track and report the time taken to quantize and lower in the backend test flow. Include this information in the generated report for each test case. Example output (from testing add operator): Test ID | Test Case | Backend | Flow | Result | Quantize Time (s) | Lowering Time (s) -- | -- | -- | -- | -- | -- | -- test_add_dtype_float32_coreml | test_add_dtype | coreml | coreml | Success (Delegated) | | 0.69 test_add_dtype_float32_coreml_static_int8 | test_add_dtype | coreml | coreml_static_int8 | Success (Delegated) | 8.73 | 0.88
### Summary Turning on the `EXECUTORCH_ENABLE_EVENT_TRACER` option will enable event tracing in the Wasm module API. The results can be obtained with the `etdump()` method. ### Test plan Added two tests depending on whether `EXECUTORCH_ENABLE_EVENT_TRACER` is turned on or not. Added the `--enable-etdump` option to `scripts/build_wasm_tests.sh` which turns on the above option. Added configurations to the `unittest-wasm-bindings` CI test to run with and without `--enable-etdump`.
Report total number of delegated and undelegated nodes and breakdown by
operator count.
Example from CoreML add:
Test ID | Test Case | Backend | Delegated Nodes | Undelegated Nodes |
Delegated Ops | Undelegated Ops
-- | -- | -- | -- | -- | -- | --
test_add_dtype_float32_coreml | test_add_dtype | coreml | 1 | 0 |
{'aten::add.Tensor': 1} | {}
test_add_dtype_float32_coreml_static_int8 | test_add_dtype | coreml | 7
| 0 | {'aten::add.Tensor': 1,
'quantized_decomposed::dequantize_per_tensor': 3,
'quantized_decomposed::quantize_per_tensor': 3} | {}
Job is failing on trunk. Temporarily disabling while I resolve it.
…#13574) ### Summary This PR replaces an IR optimization that removes dead code from the model, by an equivalent executorch call. ### Test plan Unit test provided in `backends/nxp/tests/test_removing_dead_code.py`. cc @digantdesai @JakeStevens @robert-kalmar
Differential Revision: D80772740 Pull Request resolved: pytorch#13592
Differential Revision: D80957382 Pull Request resolved: pytorch#13659
…ng to the released memory Differential Revision: D80754181 Pull Request resolved: pytorch#13590
Differential Revision: D80906791 Pull Request resolved: pytorch#13632
Differential Revision: D80914321 Pull Request resolved: pytorch#13633
Differential Revision: D80881025 Pull Request resolved: pytorch#13623
…ch#13630) Building the example application using cmake is now straight forward enough to not need any helper scripts. Additionally simplify the the example + path setup in the executor runner cmake script and make the default path match the example. cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 Signed-off-by: Adrian Lundell <[email protected]>
### Summary Add MobileNetV2 model as example and for integration testing ### Test plan Support for testing full conversion on this model is included in `run_aot_example.sh`. --------- Co-authored-by: Lukas Sztefek <[email protected]>
Signed-off-by: Agrima Khare <[email protected]> Change-Id: I032414e7454d5e2cada05b788e9eed0f7b2dc97c
Signed-off-by: Agrima Khare <[email protected]> Change-Id: I032414e7454d5e2cada05b788e9eed0f7b2dc97c
Decomposes elu into other operators/ lookup table for MI/ BI case.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218