-
Notifications
You must be signed in to change notification settings - Fork 722
Milestone2.2: Optimize transposes in XNNPACK partition by removing redundant to_copy ops #11316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11316
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 3 Unrelated FailuresAs of commit 4df2fd6 with merge base 9591978 ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
98f4027 to
173e41f
Compare
|
@pytorchbot label "release notes: none" |
| def input_dim_order( | ||
| self, input_node: torch.fx.Node, input_order: InputDimOrder | ||
| ) -> bool: | ||
| if input_node.name == "x": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace this with checking if the input_node is a placeholder?
| from executorch.exir.passes.memory_format_ops_pass import DimOrderOpsRevertPass | ||
|
|
||
|
|
||
| class TestChannelsLastTaggedReshapePass(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test that includes implicitly created dim order conversions? This will check to make sure that user created and pass-created converts get optimized out correctly. I expected it will work, but it would be nice to cover it since this is a common use case.
Maybe something like:
to_channels_last
upsample_nearest2d (not partitioned)
to_channels_first
conv
|
|
||
| # If we encounter a to_copy node, check if it is preceded by an opposite to_copy node | ||
| if node.target == exir_ops.edge.aten._to_copy.default: | ||
| if prev and ChannelsLastTaggedReshapePass.is_nchw_node( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there may be cases where the using the previous node in the iteration order might not actually be the first arg, especially in more complex graphs. Can you try replacing prev with node.args[0]? That should be sound in all cases.
| from executorch.exir.pass_base import PassResult | ||
|
|
||
|
|
||
| class RemoveRedundantOpsPass(XNNPACKPass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: rename this as RemoveRedundantCopyPass or something? This is too generic to infer what its doing
2b4643d to
85e1c4d
Compare
| continue | ||
|
|
||
| # If we encounter a to_copy node, check if its input is also a to_copy node with opposite format | ||
| if node.target == exir_ops.edge.aten._to_copy.default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of one more edge case while reading this code. We should probably check to make sure that the second copy is the only user of the first. It's possible to have two copies in a row, but something else could use the output of the first. It's unlikely, but would lead to an invalid graph in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check, thanks for this find
mcr229
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few extra test cases to make sure things look ok
| module.eval(), | ||
| inputs, | ||
| ) | ||
| tester.export().to_edge_transform_and_lower().to_executorch().serialize().run_method_and_compare_outputs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for complicated paths, can we also try quantized models?
2f21614 to
b4d34d8
Compare
Summary
Optimize transposes in XNNPACK partition by adding a new remove_redundant_ops_pass that check for dim order conversion ops that cancel each other. The pass supports non-quantized conversions and also quantized graphs. In the quantized graph case, the conversion nodes and wrapping q/dq nodes will be removed. I also refactored the channels_last_tagged_reshape_pass code by modularizing some functions and adding some setter/getter functions.
This change will improve speed/memory at runtime by not executing redundant to_copy ops that would be there otherwise.
Test plan
Created a TestChannelsLastTaggedReshapePass class which constructs graphs with multiple redundant to_copy ops in different positions and in quantized/non-quantized graphs. These redundant ops are either explicitly stated or generated via other passes. I asserted their removal after the passes finished.