-
Notifications
You must be signed in to change notification settings - Fork 722
Milestone2.1: Partition to_dim_order_copy op in XNN delegate #12220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12220
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit 365de21 with merge base a8070ec ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: none" |
| # The node requires nchw inputs | ||
| for input_node in node.all_input_nodes: | ||
| self.input_to_nchw(graph_module, input_node, node) | ||
| elif node.target == exir_ops.edge.aten._to_copy.default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the reason we still have to_copy even after we partition to_dim_order_copy is because we revert it back to to_copy. So when we add a node visitor next for the to_dim_order_copy we should remove the revert pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, should I make those changes in a follow up pr or would it be better to keep them here?
| @@ -0,0 +1,85 @@ | |||
| # Copyright (c) Meta Platforms, Inc. and affiliates. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said earlier, using to_copy is OK but we can just as easily move to to_dim_order_copy and remove the dim_order ops revert pass.
bbc194c to
c26c56b
Compare
| return True | ||
|
|
||
| def supported_precision_types(self) -> List[ConfigPrecisionType]: | ||
| return [ConfigPrecisionType.FP32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add ConfigPrecisionType.STATIC_QUANT
|
lets wait for CI, but looking good! |
…12220)" (#12542) This reverts commit dd6caa3. The imported diff is breaking an internal test: [D78368033](https://www.internalfb.com/diff/D78368033). Please see the diff for more details.
### Summary This PR adds support for the `to_dim_order_copy` operation in the XNNPACK delegate partitioner, enabling direct handling of memory format conversions initiated by users via `.to(memory_format=)` calls. This enhancement significantly improves performance by producing more compressed graphs that avoid unnecessary partitioning boundaries at memory format conversion points. By delegating these operations directly to XNNPACK, we eliminate the overhead of context switching between the runtime and delegate, reducing both execution time and memory footprint. The implementation leverages XNNPACK's highly optimized memory format conversion routines, which are specifically designed for efficient tensor layout transformations on various hardware targets. ### Test plan Confirmed expected output when having user specified dim order conversions as well as appropriate partitioning. I did this by writing individual tests for the to_copy op ensuring it changes dim order and dtype when appropriate. Also added test module to confirm that the to copy nodes are partitioned and not in another partition
…12220)" (#12542) This reverts commit dd6caa3. The imported diff is breaking an internal test: [D78368033](https://www.internalfb.com/diff/D78368033). Please see the diff for more details.
Summary
This PR adds support for the
to_dim_order_copyoperation in the XNNPACK delegate partitioner, enabling direct handling of memory format conversions initiated by users via.to(memory_format=)calls. This enhancement significantly improves performance by producing more compressed graphs that avoid unnecessary partitioning boundaries at memory format conversion points. By delegating these operations directly to XNNPACK, we eliminate the overhead of context switching between the runtime and delegate, reducing both execution time and memory footprint. The implementation leverages XNNPACK's highly optimized memory format conversion routines, which are specifically designed for efficient tensor layout transformations on various hardware targets.Test plan
Confirmed expected output when having user specified dim order conversions as well as appropriate partitioning. I did this by writing individual tests for the to_copy op ensuring it changes dim order and dtype when appropriate. Also added test module to confirm that the to copy nodes are partitioned and not in another partition