-
Notifications
You must be signed in to change notification settings - Fork 696
Choosing ops_to_preserve by delegating to quantizer #15047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: # Context The `trace` function in `compiler.py` returns the static graph representation of the model. Because certain hardwares are very optimized for certain operations, we don't want to decompose those operations in our graph. Thus, currently, the function passes in `ops_to_keep` to `trace_fn` https://www.internalfb.com/code/fbsource/[ada71b37b46a898065851182cd48b48b321a5d12]/fbcode/executorch/backends/cadence/aot/compiler.py?lines=59-86 which then removes these operations before decomposing the graph: https://www.internalfb.com/code/fbsource/[ada71b37b46a898065851182cd48b48b321a5d12]/fbcode/executorch/backends/cadence/aot/compiler_funcs.py?lines=33-36 # Issue Right now, the `ops_to_keep` is hardcoded, and there's no easy way to customize which ops to keep. View [this comment thread](https://www.internalfb.com/diff/D81703253?dst_version_fbid=1326590512174768&transaction_fbid=728299956948640) for more details. The tl;dr is that there should be stricter separation of concerns between the compiler and the hardware -- the compiler shouldn't need to know what ops to keep or not. # Possible Solutions | Solution | Pros | Cons | | -- | | Delegate `ops_to_keep` to `QuantizationPattern` | Clear separation of concerns, semantically correct, works well with current composition structure of classes, easily extensible | Changes existing behavior. i.e, `CadenceFusedConvReluQuantizer` doesn't have all patterns, but current behavior is that compiler is actually keeping all default ops. Also, `QuantizationPattern` isn't owned by us, so we might have to create a wrapper, etc. so could be not worth long-term maintenence. | | Delegate `ops_to_keep` to `CadenceQuantizer`[**chosen solution**] | Keeps existing behavior, simple for users(just add additional ops to keep) | Doesn't work well with more complex compositions(none exist), making inheritance additive is somewhat complicated, not sure if it makes sense semantically, harder to customize | # This Diff For now, we delegate `ops_to_keep` to `CadenceQuantizer`(second solution). We 1. Create a method `get_ops_to_preserve_from_decomposition`, in `CadenceQuantizer` and fill it with the default ops that's currently set for everything, and create a method `_collect_additional_ops` that goes through the inheritance chain to add all additional ops to keep, and finally, create the class attribute `ADDITIONAL_OPS_TO_PRESERVE` for subclasses to override. 2. Port over current logic into the correct quantizer 3. Update the `trace` function in `compiler.py` to use our method rather than the hardcoding. 4. Add unit tests to validate changes Now, if someone has some ops they would like to keep for a quantizer, they just need to update that quantizer's `ADDITIONAL_OPS_TO_PRESERVE`. Differential Revision: D84461403
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15047
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 7 New Failures, 5 Unrelated FailuresAs of commit 3cbf2ad with merge base d00279d ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Summary:
Context
The
tracefunction incompiler.pyreturns the static graph representation of the model. Because certain hardwares are very optimized for certain operations, we don't want to decompose those operations in our graph.Thus, currently, the function passes in
ops_to_keeptotrace_fnhttps://www.internalfb.com/code/fbsource/[ada71b37b46a898065851182cd48b48b321a5d12]/fbcode/executorch/backends/cadence/aot/compiler.py?lines=59-86
which then removes these operations before decomposing the graph:
https://www.internalfb.com/code/fbsource/[ada71b37b46a898065851182cd48b48b321a5d12]/fbcode/executorch/backends/cadence/aot/compiler_funcs.py?lines=33-36
Issue
Right now, the
ops_to_keepis hardcoded, and there's no easy way to customize which ops to keep. View this comment thread for more details.The tl;dr is that there should be stricter separation of concerns between the compiler and the hardware -- the compiler shouldn't need to know what ops to keep or not.
Possible Solutions
| Solution | Pros | Cons |
| -- |
| Delegate
ops_to_keeptoQuantizationPattern| Clear separation of concerns, semantically correct, works well with current composition structure of classes, easily extensible | Changes existing behavior. i.e,CadenceFusedConvReluQuantizerdoesn't have all patterns, but current behavior is that compiler is actually keeping all default ops. Also,QuantizationPatternisn't owned by us, so we might have to create a wrapper, etc. so could be not worth long-term maintenence. || Delegate
ops_to_keeptoCadenceQuantizer[chosen solution] | Keeps existing behavior, simple for users(just add additional ops to keep) | Doesn't work well with more complex compositions(none exist), making inheritance additive is somewhat complicated, not sure if it makes sense semantically, harder to customize |This Diff
For now, we delegate
ops_to_keeptoCadenceQuantizer(second solution). Weget_ops_to_preserve_from_decomposition, inCadenceQuantizerand fill it with the default ops that's currently set for everything, and create a method_collect_additional_opsthat goes through the inheritance chain to add all additional ops to keep, and finally, create the class attributeADDITIONAL_OPS_TO_PRESERVEfor subclasses to override.tracefunction incompiler.pyto use our method rather than the hardcoding.Now, if someone has some ops they would like to keep for a quantizer, they just need to update that quantizer's
ADDITIONAL_OPS_TO_PRESERVE.Differential Revision: D84461403