Choosing ops_to_preserve by delegating to quantizer #15047

RahulC7 · 2025-10-13T14:33:18Z

Summary:

Context

The trace function in compiler.py returns the static graph representation of the model. Because certain hardwares are very optimized for certain operations, we don't want to decompose those operations in our graph.

Thus, currently, the function passes in ops_to_keep to trace_fn

https://www.internalfb.com/code/fbsource/[ada71b37b46a898065851182cd48b48b321a5d12]/fbcode/executorch/backends/cadence/aot/compiler.py?lines=59-86

which then removes these operations before decomposing the graph:
https://www.internalfb.com/code/fbsource/[ada71b37b46a898065851182cd48b48b321a5d12]/fbcode/executorch/backends/cadence/aot/compiler_funcs.py?lines=33-36

Issue

Right now, the ops_to_keep is hardcoded, and there's no easy way to customize which ops to keep. View this comment thread for more details.

The tl;dr is that there should be stricter separation of concerns between the compiler and the hardware -- the compiler shouldn't need to know what ops to keep or not.

Possible Solutions

| Solution | Pros | Cons |
| -- |
| Delegate ops_to_keep to QuantizationPattern | Clear separation of concerns, semantically correct, works well with current composition structure of classes, easily extensible | Changes existing behavior. i.e, CadenceFusedConvReluQuantizer doesn't have all patterns, but current behavior is that compiler is actually keeping all default ops. Also, QuantizationPattern isn't owned by us, so we might have to create a wrapper, etc. so could be not worth long-term maintenence. |
| Delegate ops_to_keep to CadenceQuantizer[chosen solution] | Keeps existing behavior, simple for users(just add additional ops to keep) | Doesn't work well with more complex compositions(none exist), making inheritance additive is somewhat complicated, not sure if it makes sense semantically, harder to customize |

This Diff

For now, we delegate ops_to_keep to CadenceQuantizer(second solution). We

Create a method get_ops_to_preserve_from_decomposition, in CadenceQuantizer and fill it with the default ops that's currently set for everything, and create a method _collect_additional_ops that goes through the inheritance chain to add all additional ops to keep, and finally, create the class attribute ADDITIONAL_OPS_TO_PRESERVE for subclasses to override.
Port over current logic into the correct quantizer
Update the trace function in compiler.py to use our method rather than the hardcoding.
Add unit tests to validate changes

Now, if someone has some ops they would like to keep for a quantizer, they just need to update that quantizer's ADDITIONAL_OPS_TO_PRESERVE.

Differential Revision: D84461403

Summary: # Context The `trace` function in `compiler.py` returns the static graph representation of the model. Because certain hardwares are very optimized for certain operations, we don't want to decompose those operations in our graph. Thus, currently, the function passes in `ops_to_keep` to `trace_fn` https://www.internalfb.com/code/fbsource/[ada71b37b46a898065851182cd48b48b321a5d12]/fbcode/executorch/backends/cadence/aot/compiler.py?lines=59-86 which then removes these operations before decomposing the graph: https://www.internalfb.com/code/fbsource/[ada71b37b46a898065851182cd48b48b321a5d12]/fbcode/executorch/backends/cadence/aot/compiler_funcs.py?lines=33-36 # Issue Right now, the `ops_to_keep` is hardcoded, and there's no easy way to customize which ops to keep. View [this comment thread](https://www.internalfb.com/diff/D81703253?dst_version_fbid=1326590512174768&transaction_fbid=728299956948640) for more details. The tl;dr is that there should be stricter separation of concerns between the compiler and the hardware -- the compiler shouldn't need to know what ops to keep or not. # Possible Solutions | Solution | Pros | Cons | | -- | | Delegate `ops_to_keep` to `QuantizationPattern` | Clear separation of concerns, semantically correct, works well with current composition structure of classes, easily extensible | Changes existing behavior. i.e, `CadenceFusedConvReluQuantizer` doesn't have all patterns, but current behavior is that compiler is actually keeping all default ops. Also, `QuantizationPattern` isn't owned by us, so we might have to create a wrapper, etc. so could be not worth long-term maintenence. | | Delegate `ops_to_keep` to `CadenceQuantizer`[**chosen solution**] | Keeps existing behavior, simple for users(just add additional ops to keep) | Doesn't work well with more complex compositions(none exist), making inheritance additive is somewhat complicated, not sure if it makes sense semantically, harder to customize | # This Diff For now, we delegate `ops_to_keep` to `CadenceQuantizer`(second solution). We 1. Create a method `get_ops_to_preserve_from_decomposition`, in `CadenceQuantizer` and fill it with the default ops that's currently set for everything, and create a method `_collect_additional_ops` that goes through the inheritance chain to add all additional ops to keep, and finally, create the class attribute `ADDITIONAL_OPS_TO_PRESERVE` for subclasses to override. 2. Port over current logic into the correct quantizer 3. Update the `trace` function in `compiler.py` to use our method rather than the hardcoding. 4. Add unit tests to validate changes Now, if someone has some ops they would like to keep for a quantizer, they just need to update that quantizer's `ADDITIONAL_OPS_TO_PRESERVE`. Differential Revision: D84461403

pytorch-bot · 2025-10-13T14:33:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15047

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm][CI] Machines under the label linux.rocm.gpu.2 are undergoing maintenance.

❌ 7 New Failures, 5 Unrelated Failures

As of commit 3cbf2ad with merge base d00279d ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for backends/cadence/aot/tests/test_quantizer_ops.py:
pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t d6be923188fbaf2559275af0c7a8f5ef228360215607423adecdbdfea9b7febf /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t 5fa00b9bfb4604cdf001a0c24c6b4e00b40c9b7907ee9db2e99da7cb104863fb /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t 27ff3f3586dfbe7c4c1cb2cc5f98f569533ba9a6ec553bed68e5f38ed8cbbc18 /exec failed with exit code 1
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t b3bf08df922facae6f1ed5a3941a47bee7016dbfad3cb385b85d4227c109afdb /exec failed with exit code 1
pull / test-vulkan-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 2c68f3957fd364d99b91f31393218d41e0dc404b9c2805a60b51ad2586a71a48 /exec failed with exit code 56
Test CUDA Builds / export-voxtral-cuda-artifact / linux-job (gh)
RuntimeError: Command docker exec -t a689419d43374d327b87da62e22b6c7501eed3468420098d043a55f7dbba5684 /exec failed with exit code 2

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest-editable / linux / linux-job (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-buck / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2025-10-13T14:33:26Z

@RahulC7 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84461403.

github-actions · 2025-10-13T14:39:00Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 13, 2025

meta-codesync bot added fb-exported meta-exported labels Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choosing ops_to_preserve by delegating to quantizer #15047

Choosing ops_to_preserve by delegating to quantizer #15047

Uh oh!

RahulC7 commented Oct 13, 2025

Uh oh!

pytorch-bot bot commented Oct 13, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Choosing ops_to_preserve by delegating to quantizer #15047

Are you sure you want to change the base?

Choosing ops_to_preserve by delegating to quantizer #15047

Uh oh!

Conversation

RahulC7 commented Oct 13, 2025

Context

Issue

Possible Solutions

This Diff

Uh oh!

pytorch-bot bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15047

❗ 1 Active SEVs

❌ 7 New Failures, 5 Unrelated Failures

Uh oh!

meta-codesync bot commented Oct 13, 2025

Uh oh!

github-actions bot commented Oct 13, 2025

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot bot commented Oct 13, 2025 •

edited

Loading

This PR needs a `release notes:` label