Don't constant fold Quantize/DequantizeLinear nodes by default (#2713)

ruro · web-flow · commit bbe9c2bcfe7d · 2025-12-03T13:36:48.000-08:00
I added support for exporting `QuantizeLinear`/`DequantizeLinear` nodes
(from `fake_quantize_per_*_affine` torch operators) in a previous PR.

Unfortunately, the current default onnxscript optimizer settings tend to
automatically remove any weight quantization. This is because the
`Weight -&gt; QDQ -&gt; ...` pattern looks like it can be just constant folded
to `QDQ(Weight) -&gt; ...`.

I believe that this behavior is not desirable, since the presence of
`QDQ` nodes in the graph is what allows inference engines to run the
supported computations using quantized data types. So the purpose of
`QDQ` nodes is to hold the relevant quantization "metadata". As such,
they normally shouldn't be constant folded.

I have extended the existing logic in `FoldConstantsPass` that was used
to exclude `ConstantOfShape` from constant folding.

I haven't found any tests verifying this behavior for `ConstantOfShape`
and I'm not sure, how to set up such a unit test, so I have left this
code untested for now. If adding tests is mandatory, please give me a
hint on where should I add such a test and what would be the best way to
check/assert that the optimized graph matches the expectations
(hopefully without reinventing the wheel or manually introspecting the
`ir.Model` object).
diff --git a/onnxscript/optimizer/_constant_folding.py b/onnxscript/optimizer/_constant_folding.py
@@ -26,6 +26,14 @@
 import onnxscript.utils.utils as utils
 from onnxscript.ir import _tape
 
+DEFAULT_CONSTANT_FOLD_BLACKLIST = [
+    # ConstantOfShape is preserved to avoid increasing model size unnecessarily
+    "ConstantOfShape",
+    # Quantize/DequantizeLinear are preserved to keep the quantization info
+    "QuantizeLinear",
+    "DequantizeLinear",
+]
+
 DEFAULT_CONSTANT_FOLD_INPUT_SIZE_LIMIT = 8192
 
 DEFAULT_CONSTANT_FOLD_OUTPUT_SIZE_LIMIT = 512 * 512
@@ -1226,14 +1234,17 @@ def process_node(self, node: ir.Node, is_function: bool) -> Replacement | None:
 
         elif should_fold is None:
             # Use default rules to decide whether to fold the node:
-            # - ConstantOfShape is preserved to avoid increasing model size unnecessarily
+            # - Nodes in the DEFAULT_CONSTANT_FOLD_BLACKLIST list are not folded
             # - If the any tensor input size exceeds the input_size_limit, skip folding the node
-            if _is_onnx_op(node, "ConstantOfShape"):
-                logger.info(
-                    "Skipping constant folding for node %r because ConstantOfShape is preserved by default",
-                    node.name,
-                )
-                return None
+            for op_type in DEFAULT_CONSTANT_FOLD_BLACKLIST:
+                if _is_onnx_op(node, op_type):
+                    logger.info(
+                        "Skipping constant folding for node %r because "
+                        "%s is preserved by default",
+                        node.name,
+                        op_type,
+                    )
+                    return None
 
             input_tensors = [x.const_value if x is not None else None for x in node.inputs]
             large_inputs = [