-
Notifications
You must be signed in to change notification settings - Fork 698
[EXIR] Update RemoveCloneOpsTransform to be dim order aware #12976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EXIR] Update RemoveCloneOpsTransform to be dim order aware #12976
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12976
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 51 Pending, 1 Unrelated FailureAs of commit 15ff154 with merge base 29cec35 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: none" |
### Summary This is PR 1 of 3 implementing a dim order aware clone op. Currently, clone ops are removed during export as no-ops, causing memory layout (dim order) changes to be lost. This can cause backend failures, incorrect outputs when ops expect specific layouts, and performance degradation. This set of PRs introduces a dim order aware clone op, `_clone_dim_order`, which preserves memory layout changes by explicitly storing dim order information. This is implemented by replacing standard clone ops with this variant during export and updating the clone removal transform to preserve clones that change layout. This PR adds the portable CPU kernel for the `_clone_dim_order` op, implementing a clone variant that preserves dim order at runtime. The portable kernel validates dtype and layout compatibility, resizes the output tensor if needed, and performs an element wise clone of the tensors. Note: A future PR will add the ATen kernel for `_clone_dim_order`. Related PRs: - PR 2: [#12971](#12971) - Register `_clone_dim_order` op and map `aten.clone` - PR 3: [#12976](#12976) - Update RemoveCloneOpsTransform to be dim_order aware Fixes #12645 ### Test plan Added kernel runtime tests to verify: - Tensors of all real dtypes are cloned correctly. - Failure when input and output tensor shapes mismatch. - Failure with unsupported memory formats. - Failure when `non_blocking=true` since the portable kernel only supports blocking data transfer. - Dynamic shape outputs are cloned with correct values. - Layout conversions are cloned correctly for `contiguous` to `channels_last`, `channels_last` to `contiguous`, and `channels_last` is preserved. All runtime tests pass via: `build-ninja/kernels/test/portable_kernels_test` --------- Co-authored-by: Gasoonjia <[email protected]>
| if node.op != "call_function": | ||
| continue | ||
|
|
||
| # Identify clone_dim_order ops with unchanged memory layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are supporting aten.clone elimination through this pass then we should similarly check memory_format arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! I added the check for aten.clone and updated the tests. I'll refactor/simplify the test cases if needed once we land the AOT PR since it includes its own tests.
### Summary This is PR 1 of 3 implementing a dim order aware clone op. Currently, clone ops are removed during export as no-ops, causing memory layout (dim order) changes to be lost. This can cause backend failures, incorrect outputs when ops expect specific layouts, and performance degradation. This set of PRs introduces a dim order aware clone op, `_clone_dim_order`, which preserves memory layout changes by explicitly storing dim order information. This is implemented by replacing standard clone ops with this variant during export and updating the clone removal transform to preserve clones that change layout. This PR adds the portable CPU kernel for the `_clone_dim_order` op, implementing a clone variant that preserves dim order at runtime. The portable kernel validates dtype and layout compatibility, resizes the output tensor if needed, and performs an element wise clone of the tensors. Note: A future PR will add the ATen kernel for `_clone_dim_order`. Related PRs: - PR 2: [pytorch#12971](pytorch#12971) - Register `_clone_dim_order` op and map `aten.clone` - PR 3: [pytorch#12976](pytorch#12976) - Update RemoveCloneOpsTransform to be dim_order aware Fixes pytorch#12645 ### Test plan Added kernel runtime tests to verify: - Tensors of all real dtypes are cloned correctly. - Failure when input and output tensor shapes mismatch. - Failure with unsupported memory formats. - Failure when `non_blocking=true` since the portable kernel only supports blocking data transfer. - Dynamic shape outputs are cloned with correct values. - Layout conversions are cloned correctly for `contiguous` to `channels_last`, `channels_last` to `contiguous`, and `channels_last` is preserved. All runtime tests pass via: `build-ninja/kernels/test/portable_kernels_test` --------- Co-authored-by: Gasoonjia <[email protected]>
### Summary This is PR 2 of 3 implementing a dim order aware clone op. This PR registers the new `_clone_dim_order` op and maps `aten.clone` ops to `dim_order_ops._clone_dim_order` in EXIR during export to preserve memory layout changes (contiguous/channels_last). It also updates Core ML and ARM backends to handle the new clone op. Related PRs: - PR 1: [#12974](#12974) - Add `_clone_dim_order` portable kernel - PR 3: [#12976](#12976) - Update RemoveCloneOpsTransform to be dim order aware Fixes #12645 ### Test plan - Operator level tests to verify kernel behavior for layout preservation and changes. - Graph level checks to confirm that clone mapping occurs. - End to end tests to validate that functional clone behavior is unchanged. All tests pass via: `python -m unittest exir.tests.test_memory_format_ops_pass` `python -m unittest backends.apple.coreml.test.test_torch_ops` `pytest backends/arm/test/ops/test_clone.py` `pytest backends/arm/test/passes/test_remove_clone_pass.py` --------- Co-authored-by: Gasoonjia <[email protected]> Co-authored-by: Digant Desai <[email protected]>
### Summary This is PR 2 of 3 implementing a dim order aware clone op. This PR registers the new `_clone_dim_order` op and maps `aten.clone` to `dim_order_ops._clone_dim_order` in EXIR during export to preserve memory layout changes (contiguous/channels_last). It also updates the Core ML, ARM, and Qualcomm backends to handle the new clone op. Related PRs: - PR 1: [#12974](#12974) - Add `_clone_dim_order` portable kernel - PR 3: [#12976](#12976) - Update RemoveCloneOpsTransform to be dim order aware Fixes #12645 ### Test plan - Operator level tests to verify kernel behavior for layout preservation and changes. - Graph level checks to confirm that clone mapping occurs. - End to end tests to validate that functional clone behavior is unchanged. - Backend tests to ensure clone semantics are preserved. All tests pass via: `python -m unittest exir.tests.test_memory_format_ops_pass` `python -m unittest backends.apple.coreml.test.test_torch_ops` `pytest backends/arm/test/ops/test_clone.py` `pytest backends/arm/test/passes/test_remove_clone_pass.py` --------- Co-authored-by: Gasoonjia <[email protected]> Co-authored-by: Digant Desai <[email protected]>
|
|
||
| to_be_remove = n | ||
| # Skip removal of clone ops that modify layout/dim order. | ||
| if self.aten_clone_is_non_identity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UFMT formatter forces this split style
|
@Gasoonjia this PR is ready to go, would you mind reviewing when you get a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @keyprocedure for your help!
Some subtle feedback but overall look great!
| dead_code_elimination_pass(graph_module) | ||
| return PassResult(graph_module, True) | ||
|
|
||
| def aten_clone_is_non_identity(self, node: torch.fx.Node) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's combine the two check functions, aten_clone_is_non_identity and _clone_dim_order_is_non_identity into a single function (maybe called _is_non_identity_clone). Under current scenario we will always use the funcs together and this func should be private.
| self.assertTrue(is_contiguous_dim_order(actual)) | ||
| self.assertTrue(is_contiguous_dim_order(expected)) | ||
|
|
||
| def test_op_clone_replacement_channels_last_survives(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets move test to test_remove_clone_ops.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thansk for your wonderful work!
Only a name update suggestion but i think we can stamp it as long as ci passes!
| transformed_gm.code | ||
| ) | ||
|
|
||
| def test_clone_channels_last_survives(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe should called it test_clone_non_identity_survives? cuz it survives because of mutating memory_format / dim_order, rather than cloning a channels_last tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, that makes more sense
No problem! |
Let's change it in this PR if you have time! |
|
Done :) |
|
@keyprocedure we might have to add the tests here: https://github.com/pytorch/executorch/blob/main/pytest.ini |
|
@mergennachin good catch, I've added the test file to the pytest config. |
|
Hi @Gasoonjia, the CI failures are unrelated to this PR:
Feel free to let me know if you'd like anything in this PR changed, otherwise this should be good to go |
|
Thanks for your help @keyprocedure ! I think nxp issue should be solved already. Let me rebase and retest. |
|
Hi @keyprocedure ci looks good but there's a conflict in pytest.ini. Mind take a look? |
|
@Gasoonjia, the merge conflict is resolved: |
|
#14088 has been landed but this ci raised some issue. Looks like it is preexisted in our codebase. Will rebase and retrigger ci. |
|
I appreciate your reviews across all the PRs @Gasoonjia, and for handling CI and merges. |
|
@keyprocedure thank you for such active contribution! Looking forward to work with you in the future! |
Summary
This is PR 3 of 3 implementing a dim order aware clone op.
This PR updates the clone removal pass to retain layout changing
aten.cloneand_clone_dim_orderops and remove no-op clones, ensuring layout/dim order is preserved through export.Related PRs:
_clone_dim_orderportable kernel_clone_dim_orderop and mapaten.cloneFixes #12645
Test plan
Added tests to verify:
All tests pass via:
python -m unittest exir.tests.test_memory_format_ops_pass
python -m unittest backends.transforms.test.test_remove_clone_ops