Skip to content

Commit bbe9c2b

Browse files
authored
Don't constant fold Quantize/DequantizeLinear nodes by default (#2713)
I added support for exporting `QuantizeLinear`/`DequantizeLinear` nodes (from `fake_quantize_per_*_affine` torch operators) in a previous PR. Unfortunately, the current default onnxscript optimizer settings tend to automatically remove any weight quantization. This is because the `Weight -> QDQ -> ...` pattern looks like it can be just constant folded to `QDQ(Weight) -> ...`. I believe that this behavior is not desirable, since the presence of `QDQ` nodes in the graph is what allows inference engines to run the supported computations using quantized data types. So the purpose of `QDQ` nodes is to hold the relevant quantization "metadata". As such, they normally shouldn't be constant folded. I have extended the existing logic in `FoldConstantsPass` that was used to exclude `ConstantOfShape` from constant folding. I haven't found any tests verifying this behavior for `ConstantOfShape` and I'm not sure, how to set up such a unit test, so I have left this code untested for now. If adding tests is mandatory, please give me a hint on where should I add such a test and what would be the best way to check/assert that the optimized graph matches the expectations (hopefully without reinventing the wheel or manually introspecting the `ir.Model` object).
1 parent 3e7d9fb commit bbe9c2b

File tree

1 file changed

+18
-7
lines changed

1 file changed

+18
-7
lines changed

onnxscript/optimizer/_constant_folding.py

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,14 @@
2626
import onnxscript.utils.utils as utils
2727
from onnxscript.ir import _tape
2828

29+
DEFAULT_CONSTANT_FOLD_BLACKLIST = [
30+
# ConstantOfShape is preserved to avoid increasing model size unnecessarily
31+
"ConstantOfShape",
32+
# Quantize/DequantizeLinear are preserved to keep the quantization info
33+
"QuantizeLinear",
34+
"DequantizeLinear",
35+
]
36+
2937
DEFAULT_CONSTANT_FOLD_INPUT_SIZE_LIMIT = 8192
3038

3139
DEFAULT_CONSTANT_FOLD_OUTPUT_SIZE_LIMIT = 512 * 512
@@ -1226,14 +1234,17 @@ def process_node(self, node: ir.Node, is_function: bool) -> Replacement | None:
12261234

12271235
elif should_fold is None:
12281236
# Use default rules to decide whether to fold the node:
1229-
# - ConstantOfShape is preserved to avoid increasing model size unnecessarily
1237+
# - Nodes in the DEFAULT_CONSTANT_FOLD_BLACKLIST list are not folded
12301238
# - If the any tensor input size exceeds the input_size_limit, skip folding the node
1231-
if _is_onnx_op(node, "ConstantOfShape"):
1232-
logger.info(
1233-
"Skipping constant folding for node %r because ConstantOfShape is preserved by default",
1234-
node.name,
1235-
)
1236-
return None
1239+
for op_type in DEFAULT_CONSTANT_FOLD_BLACKLIST:
1240+
if _is_onnx_op(node, op_type):
1241+
logger.info(
1242+
"Skipping constant folding for node %r because "
1243+
"%s is preserved by default",
1244+
node.name,
1245+
op_type,
1246+
)
1247+
return None
12371248

12381249
input_tensors = [x.const_value if x is not None else None for x in node.inputs]
12391250
large_inputs = [

0 commit comments

Comments
 (0)