Skip to content

Conversation

@leafs1
Copy link
Contributor

@leafs1 leafs1 commented Jul 3, 2025

Summary

This PR adds support for the to_dim_order_copy operation in the XNNPACK delegate partitioner, enabling direct handling of memory format conversions initiated by users via .to(memory_format=) calls. This enhancement significantly improves performance by producing more compressed graphs that avoid unnecessary partitioning boundaries at memory format conversion points. By delegating these operations directly to XNNPACK, we eliminate the overhead of context switching between the runtime and delegate, reducing both execution time and memory footprint. The implementation leverages XNNPACK's highly optimized memory format conversion routines, which are specifically designed for efficient tensor layout transformations on various hardware targets.

Test plan

Confirmed expected output when having user specified dim order conversions as well as appropriate partitioning. I did this by writing individual tests for the to_copy op ensuring it changes dim order and dtype when appropriate. Also added test module to confirm that the to copy nodes are partitioned and not in another partition

@leafs1 leafs1 requested review from digantdesai and mcr229 as code owners July 3, 2025 23:09
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12220

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 365de21 with merge base a8070ec (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 3, 2025
@leafs1
Copy link
Contributor Author

leafs1 commented Jul 3, 2025

@pytorchbot label "release notes: none"

@pytorch-bot pytorch-bot bot added the release notes: none Do not include this in the release notes label Jul 3, 2025
# The node requires nchw inputs
for input_node in node.all_input_nodes:
self.input_to_nchw(graph_module, input_node, node)
elif node.target == exir_ops.edge.aten._to_copy.default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the reason we still have to_copy even after we partition to_dim_order_copy is because we revert it back to to_copy. So when we add a node visitor next for the to_dim_order_copy we should remove the revert pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, should I make those changes in a follow up pr or would it be better to keep them here?

@@ -0,0 +1,85 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said earlier, using to_copy is OK but we can just as easily move to to_dim_order_copy and remove the dim_order ops revert pass.

@leafs1 leafs1 force-pushed the milestone2.1 branch 3 times, most recently from bbc194c to c26c56b Compare July 11, 2025 21:26
return True

def supported_precision_types(self) -> List[ConfigPrecisionType]:
return [ConfigPrecisionType.FP32]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add ConfigPrecisionType.STATIC_QUANT

@mcr229
Copy link
Contributor

mcr229 commented Jul 14, 2025

lets wait for CI, but looking good!

@leafs1 leafs1 merged commit dd6caa3 into pytorch:main Jul 15, 2025
95 of 98 checks passed
SS-JIA added a commit that referenced this pull request Jul 16, 2025
…12220)"

This reverts commit dd6caa3.

ghstack-source-id: f7e75ad
ghstack-comment-id: 3079037022
Pull Request resolved: #12542
SS-JIA added a commit that referenced this pull request Jul 16, 2025
…12220)" (#12542)

This reverts commit dd6caa3.

The imported diff is breaking an internal test:
[D78368033](https://www.internalfb.com/diff/D78368033). Please see the
diff for more details.
lucylq pushed a commit that referenced this pull request Jul 17, 2025
### Summary
This PR adds support for the `to_dim_order_copy` operation in the
XNNPACK delegate partitioner, enabling direct handling of memory format
conversions initiated by users via `.to(memory_format=)` calls. This
enhancement significantly improves performance by producing more
compressed graphs that avoid unnecessary partitioning boundaries at
memory format conversion points. By delegating these operations directly
to XNNPACK, we eliminate the overhead of context switching between the
runtime and delegate, reducing both execution time and memory footprint.
The implementation leverages XNNPACK's highly optimized memory format
conversion routines, which are specifically designed for efficient
tensor layout transformations on various hardware targets.

### Test plan
Confirmed expected output when having user specified dim order
conversions as well as appropriate partitioning. I did this by writing
individual tests for the to_copy op ensuring it changes dim order and
dtype when appropriate. Also added test module to confirm that the to
copy nodes are partitioned and not in another partition
lucylq pushed a commit that referenced this pull request Jul 17, 2025
…12220)" (#12542)

This reverts commit dd6caa3.

The imported diff is breaking an internal test:
[D78368033](https://www.internalfb.com/diff/D78368033). Please see the
diff for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants