-
Notifications
You must be signed in to change notification settings - Fork 190
[Autocast] Add low precision autocasting support for Resize op #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRemoves "Resize" from the global low-precision blacklist and introduces per-input skip mappings for Resize. Updates PrecisionConverter signatures and internal flows to track FP32 inputs for low-precision nodes, adjust initializer and cast insertion logic, and adds unit tests covering Resize scenarios. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant PC as PrecisionConverter
participant G as ONNX Graph
participant N as Node
participant R as Resize Node
participant I as Initializer
PC->>G: build node->initializers map (TensorProto lists)
PC->>G: call _get_tensors_to_cast()
G-->>PC: return (cast_to_fp16, cast_to_fp32, fp32_input_to_low_precision_node)
alt per-input check for low-precision nodes
PC->>R: for each input, call _should_skip_low_precision_input_conversion
alt skip input
PC->>PC: record input in fp32_input_to_low_precision_node
else
PC->>PC: mark input for low-precision cast
end
end
PC->>I: _convert_initializers (skip inputs flagged to remain FP32)
PC->>G: insert Cast nodes for cast_to_fp16/cast_to_fp32, exclude consumers per fp32_input mapping
PC->>G: return updated model
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)
1056-1076: Consider usingAssertionErrorinstead ofKeyError.At lines 1064-1066, the code raises
KeyErrorif the initializer is not found in the node's inputs. Since this check validates an internal invariant (initializers innode_to_init_map[node.name]should always be innode.input), consider usingAssertionErrororValueErrorinstead to better signal a programming error rather than a missing key.Example:
if init.name not in node.input: - raise KeyError( - f"Initializer {init.name} not found in node {node.name} input, not expected!" - ) + raise AssertionError( + f"Initializer {init.name} not found in node {node.name} input - internal invariant violated" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modelopt/onnx/autocast/precisionconverter.py(3 hunks)tests/unit/onnx/autocast/test_precisionconverter.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/onnx/autocast/test_precisionconverter.py (2)
modelopt/onnx/utils.py (1)
check_model(557-569)modelopt/onnx/autocast/precisionconverter.py (1)
convert(113-205)
🔇 Additional comments (2)
modelopt/onnx/autocast/precisionconverter.py (1)
551-552: LGTM!The integration of the skip-check correctly prevents conversion of Resize initializers at specific indices before attempting the conversion.
tests/unit/onnx/autocast/test_precisionconverter.py (1)
1109-1159: LGTM!The fixture creates a well-structured model with a Resize operation and appropriate initializers to test the new conversion logic.
| @pytest.mark.parametrize("keep_io_types", [True, False]) | ||
| @pytest.mark.parametrize("low_precision_type", ["fp16", "bf16"]) | ||
| def test_resize_op_initializer_conversion( | ||
| create_model_with_resize_op, keep_io_types, low_precision_type | ||
| ): | ||
| model, value_info_map, initializer_map, node_to_init_map = create_model_with_resize_op | ||
|
|
||
| converter = PrecisionConverter( | ||
| model, | ||
| value_info_map, | ||
| initializer_map, | ||
| node_to_init_map, | ||
| keep_io_types=keep_io_types, | ||
| low_precision_type=low_precision_type, | ||
| ) | ||
| converted_model = converter.convert( | ||
| high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node] | ||
| ) | ||
| onnx.checker.check_model(converted_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add assertions to verify initializer precision handling.
The test validates that the model is valid after conversion, but it doesn't verify the core functionality: that roi and scales initializers remain FP32 while add_const is converted to low precision. This leaves a gap in test coverage.
Consider adding assertions like:
converted_model = converter.convert(
high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
)
onnx.checker.check_model(converted_model)
+
+ # Verify roi and scales remain FP32 (indices 1, 2 of Resize)
+ roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi")
+ scales_init = next(init for init in converted_model.graph.initializer if init.name == "scales")
+ assert roi_init.data_type == TensorProto.FLOAT
+ assert scales_init.data_type == TensorProto.FLOAT
+
+ # Verify add_const is converted to low precision
+ add_const_init = next(init for init in converted_model.graph.initializer if init.name == "add_const")
+ assert add_const_init.data_type == low_precision_onnx_type(low_precision_type)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.parametrize("keep_io_types", [True, False]) | |
| @pytest.mark.parametrize("low_precision_type", ["fp16", "bf16"]) | |
| def test_resize_op_initializer_conversion( | |
| create_model_with_resize_op, keep_io_types, low_precision_type | |
| ): | |
| model, value_info_map, initializer_map, node_to_init_map = create_model_with_resize_op | |
| converter = PrecisionConverter( | |
| model, | |
| value_info_map, | |
| initializer_map, | |
| node_to_init_map, | |
| keep_io_types=keep_io_types, | |
| low_precision_type=low_precision_type, | |
| ) | |
| converted_model = converter.convert( | |
| high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node] | |
| ) | |
| onnx.checker.check_model(converted_model) | |
| @pytest.mark.parametrize("keep_io_types", [True, False]) | |
| @pytest.mark.parametrize("low_precision_type", ["fp16", "bf16"]) | |
| def test_resize_op_initializer_conversion( | |
| create_model_with_resize_op, keep_io_types, low_precision_type | |
| ): | |
| model, value_info_map, initializer_map, node_to_init_map = create_model_with_resize_op | |
| converter = PrecisionConverter( | |
| model, | |
| value_info_map, | |
| initializer_map, | |
| node_to_init_map, | |
| keep_io_types=keep_io_types, | |
| low_precision_type=low_precision_type, | |
| ) | |
| converted_model = converter.convert( | |
| high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node] | |
| ) | |
| onnx.checker.check_model(converted_model) | |
| # Verify roi and scales remain FP32 (indices 1, 2 of Resize) | |
| roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi") | |
| scales_init = next(init for init in converted_model.graph.initializer if init.name == "scales") | |
| assert roi_init.data_type == TensorProto.FLOAT | |
| assert scales_init.data_type == TensorProto.FLOAT | |
| # Verify add_const is converted to low precision | |
| add_const_init = next( | |
| init for init in converted_model.graph.initializer if init.name == "add_const" | |
| ) | |
| assert add_const_init.data_type == low_precision_onnx_type(low_precision_type) |
| for node in self.model.graph.node: | ||
| if node.name in low_precision_nodes: | ||
| for init in self.node_to_init_map[node.name]: | ||
| if self._should_skip_low_precision_initializer_conversion(node, init): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aboubezari This PR assumes that the scales input is an initializer, but it doesn't have to be.
We'll also need to add a check for dynamic input, maybe in _get_tensors_to_cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galagam You're right. I've tweaked the logic in _get_tensors_to_cast to avoid adding those casts, and tracked some extra metadata to make sure the casts get applied properly to these special mixed-precision input nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)
83-83: Update docstring to match type annotation.The docstring states "lists of initializer names" but the type annotation shows
list[onnx.TensorProto]. The implementation at line 569 confirms TensorProto objects are expected.Apply this diff:
- node_to_init_map: Map of node names to lists of initializer names. + node_to_init_map: Map of node names to lists of initializers.
♻️ Duplicate comments (2)
modelopt/onnx/autocast/precisionconverter.py (1)
49-49: LGTM: Resize correctly removed from unsupported ops.With the new per-input precision handling, Resize can now be supported in low precision while keeping its roi and scales inputs in FP32 as required.
Note: A past review comment already flagged that the comment at lines 395-398 needs updating to reflect this change.
tests/unit/onnx/autocast/test_precisionconverter.py (1)
1213-1231: Add assertions to verify precision handling.Both test functions validate model correctness but don't verify the core functionality: that roi/scales remain FP32 while other initializers/tensors are converted to low precision. A past review comment already identified this gap and provided specific assertion suggestions.
Also applies to: 1234-1254
🧹 Nitpick comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)
1075-1097: LGTM: Resize-specific input handling is correctly implemented.The method properly identifies Resize inputs at indices 1 and 2 (roi and scales) that must remain in FP32. The conservative approach of keeping both in FP32 is safe, though the ONNX spec indicates roi could support low precision.
Consider clarifying the comment at line 1092-1093 to be more precise:
- # The second input does not support bfloat16, so leave it in FP32. - # The third input of Resize must remain in FP32. + # The second input (roi) and third input (scales) of Resize should remain in FP32. + # While roi may support low precision per spec, scales must be FP32, so we keep both FP32 for safety.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modelopt/onnx/autocast/precisionconverter.py(7 hunks)tests/unit/onnx/autocast/test_precisionconverter.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/onnx/autocast/test_precisionconverter.py (2)
modelopt/onnx/autocast/precisionconverter.py (2)
_get_tensors_to_cast(412-457)convert(113-214)modelopt/onnx/utils.py (1)
check_model(557-569)
🔇 Additional comments (7)
modelopt/onnx/autocast/precisionconverter.py (4)
150-167: LGTM: Per-input casting logic is well-implemented.The logic correctly handles the case where low-precision nodes require specific FP32 inputs by adjusting the
exclude_consumerslist. The tensor is cast to FP32 for nodes that need it while remaining low-precision for others.
412-457: LGTM: Enhanced tensor casting logic with per-input precision control.The method now correctly identifies which inputs to low-precision nodes must remain in FP32 (like Resize's roi and scales inputs) and tracks the associated nodes. The implementation properly separates casting decisions per input rather than per node.
570-571: LGTM: Initializer conversion correctly respects per-input requirements.The guard properly skips converting initializers that must remain in FP32 (e.g., Resize's roi and scales).
71-71: Docstring update only: node_to_init_map holds lists of onnx.TensorProto. No external API break—all internal callers (including tests) already pass TensorProto lists. Update the constructor docstring to reflect the new type.Likely an incorrect or invalid review comment.
tests/unit/onnx/autocast/test_precisionconverter.py (3)
180-180: LGTM: Test calls correctly updated for new API.All calls to
_get_tensors_to_castproperly unpack three values, discarding the newfp32_input_to_low_precision_nodemapping with_since these tests don't need it.Also applies to: 189-189, 315-315, 321-321
1106-1153: LGTM: Well-structured test fixture for Resize with initializer inputs.The fixture creates a realistic model with Resize in a computation pipeline, using initializer-based roi and scales. The empty roi array is a valid edge case per ONNX spec.
1155-1210: LGTM: Fixture addresses dynamic scales concern.This fixture correctly tests the scenario raised by galagam where scales come from a tensor (computed at runtime) rather than an initializer. This ensures the implementation handles both static and dynamic Resize inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modelopt/onnx/autocast/precisionconverter.py(7 hunks)
🔇 Additional comments (5)
modelopt/onnx/autocast/precisionconverter.py (5)
49-56: LGTM! Clean removal of Resize from blacklist with proper skip mappings.The changes correctly remove "Resize" from the unsupported ops list and introduce per-input skip mappings that preserve FP32 for specific Resize inputs (scales at index 1 for both FP16/BF16, and additionally index 2 for BF16). This is consistent with ONNX spec requirements and the PR objectives.
75-75: LGTM! Type signature now matches implementation.The parameter type change from
list[str]tolist[onnx.TensorProto]accurately reflects the actual usage throughout the code, particularly in_get_node_initializers_mapand_convert_initializers.
154-171: LGTM! Correct handling of selective FP32 inputs to low-precision nodes.The updated logic properly manages the case where specific inputs to low-precision nodes must remain FP32 (e.g., Resize scales input). The exclude_consumers adjustment ensures that cast operations are correctly applied only where needed.
416-461: LGTM! _get_tensors_to_cast correctly implements per-input skip logic.The method properly iterates through low-precision node inputs, checks which should remain FP32 via
_should_skip_low_precision_input_conversion, and builds the appropriate cast lists. The triple return value and network IO handling are correctly implemented.
574-575: LGTM! Initializer conversion correctly respects skip logic.The skip check properly prevents conversion of initializers that must remain FP32, maintaining consistency with the overall precision handling strategy.
| def _should_skip_low_precision_input_conversion( | ||
| self, node: onnx.NodeProto, input_name: str | ||
| ) -> bool: | ||
| """Check if the input should be skipped for low precision conversion. | ||
| This is used for nodes that have inputs that MUST remain in FP32. | ||
| """ | ||
| match self.low_precision_type: | ||
| case "fp16": | ||
| skip_inputs_map = SKIP_LOW_PRECISION_MAPPING_FP16 | ||
| case "bf16": | ||
| skip_inputs_map = SKIP_LOW_PRECISION_MAPPING_BF16 | ||
| case _: | ||
| raise ValueError(f"Unsupported low precision type: {self.low_precision_type}") | ||
|
|
||
| if node.op_type in skip_inputs_map: | ||
| # Figure out the index of the input in the node input | ||
| inputs_lst = list(node.input) | ||
| input_index = inputs_lst.index(input_name) | ||
| # Check if we should skip this input for low precision conversion | ||
| return input_index in skip_inputs_map[node.op_type] | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical bug: Match statement compares PrecisionTypes object against string literals.
At line 1086, the match statement compares self.low_precision_type (a PrecisionTypes namedtuple set at line 106) against string cases "fp16" and "bf16". This will never match, causing the function to always raise ValueError at line 1092.
Apply this diff to fix the match statement:
- match self.low_precision_type:
- case "fp16":
+ match self.low_precision_type.str_short:
+ case "fp16":
skip_inputs_map = SKIP_LOW_PRECISION_MAPPING_FP16
- case "bf16":
+ case "bf16":
skip_inputs_map = SKIP_LOW_PRECISION_MAPPING_BF16
case _:
raise ValueError(f"Unsupported low precision type: {self.low_precision_type}")Additionally, at line 1097, inputs_lst.index(input_name) will raise ValueError if input_name is not in the node's inputs (e.g., empty string for optional inputs). Consider adding a membership check:
if node.op_type in skip_inputs_map:
# Figure out the index of the input in the node input
inputs_lst = list(node.input)
+ if input_name not in inputs_lst:
+ return False
input_index = inputs_lst.index(input_name)
# Check if we should skip this input for low precision conversion
return input_index in skip_inputs_map[node.op_type]
return False🤖 Prompt for AI Agents
In modelopt/onnx/autocast/precisionconverter.py around lines 1079-1100, the
match on self.low_precision_type uses string literals ("fp16"/"bf16") but
self.low_precision_type is a PrecisionTypes object, causing the ValueError path
to always run; change the match/cases to compare against the PrecisionTypes
members (e.g., PrecisionTypes.fp16 and PrecisionTypes.bf16) or use the
enum's/value's string (e.g., self.low_precision_type.value == "fp16"), and
before calling inputs_lst.index(input_name) add a membership check (if
input_name in inputs_lst) and return False when not present to avoid raising
ValueError for optional/empty inputs.
|
@i-riyad @ajrasane @gcunhase please note that this could cause some accuracy regressions in the quantization workflow. |
@i-riyad @ajrasane do you remember which model was having accuracy drops when converting Resize to FP16? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)
54-56: Add inline documentation for the skip mappings.The constant mappings
SKIP_LOW_PRECISION_MAPPING_FP16andSKIP_LOW_PRECISION_MAPPING_BF16would benefit from inline comments explaining which inputs each index represents for each operator. For example:# Mapping of op types to indices of inputs that should not be converted to low precision. # Resize inputs: 0=data, 1=roi/scales (must be FP32), 2=sizes (must be FP32 for bf16) SKIP_LOW_PRECISION_MAPPING_FP16 = {"Resize": {1}} SKIP_LOW_PRECISION_MAPPING_BF16 = {"Resize": {1, 2}}This will make it easier to maintain and extend these mappings in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modelopt/onnx/autocast/precisionconverter.py(7 hunks)
🔇 Additional comments (6)
modelopt/onnx/autocast/precisionconverter.py (6)
75-75: LGTM: Type annotation correction.The type correction from
list[str]tolist[onnx.TensorProto]accurately reflects the actual usage of this parameter throughout the codebase.
154-171: LGTM: Correct handling of mixed-precision inputs.The logic correctly handles the case where a low-precision node requires specific inputs to remain in FP32. By selectively excluding consumers based on the
fp32_input_to_low_precision_nodemapping, the casting logic ensures that FP32 inputs reach the nodes that need them while still casting appropriately for other consumers.
416-461: LGTM: Per-input skip logic correctly implemented.The updated
_get_tensors_to_castmethod correctly implements per-input precision handling. The logic properly categorizes inputs that must remain FP32 (via_should_skip_low_precision_input_conversion) while adding them to the appropriate cast lists and tracking the relationship infp32_input_to_low_precision_node. Empty/optional inputs are correctly filtered by the_is_castable_tensorcheck.
574-575: LGTM: Initializer conversion correctly respects skip logic.The check ensures that initializers feeding FP32-required inputs are not converted to low precision, maintaining consistency with the tensor casting logic.
1079-1102: LGTM: Skip logic correctly implemented.The method correctly determines whether a specific input should remain in FP32 based on the operator type and input index. The match statement properly uses
.str_shortto select the appropriate mapping, and the membership check at line 1097 ensures the input belongs to the node before checking its index.The
ValueErrorat line 1098 is a good defensive check, though it should never occur in normal operation since callers iterate overnode.inputor use initializer names that are also innode.input.
49-49: Verify accuracy impact on vision models.Based on the PR comments, enabling fp16/bfloat16 for Resize may cause accuracy regressions in certain quantization workflows. The reviewers specifically mentioned potential issues with
vit_largeandswinmodels.While the implementation correctly handles the ONNX specification (keeping scales/sizes inputs in FP32), ensure comprehensive accuracy testing is performed on the models that previously showed regressions when Resize was converted to FP16. Users can still explicitly block Resize conversion if needed using
op_block_list=["Resize"].Based on PR objectives and review comments.
02f729c to
b9a8e3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)
71-77: Docstring mismatch: node_to_init_map holds TensorProto objects, not names.- node_to_init_map: Map of node names to lists of initializer names. + node_to_init_map: Map of node names to lists of onnx.TensorProto initializers.
♻️ Duplicate comments (2)
tests/unit/onnx/autocast/test_precisionconverter.py (1)
1284-1303: Add dtype assertions for roi/scales/add_const to validate core behavior.Current test only checks model validity; add precise dtype checks to guard against regressions.
@@ def test_resize_op_initializer_conversion( - onnx.checker.check_model(converted_model) + onnx.checker.check_model(converted_model) + + # Verify roi/scales dtypes per ONNX Resize spec + roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi") + scales_init = next(init for init in converted_model.graph.initializer if init.name == "scales") + # scales must remain FP32 + assert scales_init.data_type == TensorProto.FLOAT + # roi: BF16 unsupported; FP16 allowed by spec. Accept FP32/FP16 for fp16 path, require FP32 for bf16 path. + if low_precision_type == "bf16": + assert roi_init.data_type == TensorProto.FLOAT + else: + assert roi_init.data_type in (TensorProto.FLOAT, TensorProto.FLOAT16) + + # Verify data initializer got converted to low precision + add_const_init = next(init for init in converted_model.graph.initializer if init.name == "add_const") + assert add_const_init.data_type == low_precision_onnx_type(low_precision_type)Reference: ONNX Resize input types — scales: tensor(float), roi: tensor(float16|float|double). (onnx.ai)
modelopt/onnx/autocast/precisionconverter.py (1)
405-423: Update outdated comment: Resize no longer blanket-blocked.Comment still claims both Resize and Upsample must be high precision. With per-input skipping, only Upsample remains globally blocked; Resize is supported with selective FP32 inputs.
- # NonMaxSuppression and Celu require FP32 inputs per ONNX standard - # Resize and Upsample allow the data input (index 0) to be FP16/BF16 per ONNX standard, but require the scale - # input (index 1) to be FP32. However, AutoCast requires a binary classification for each node: high/low - # precision so we need to set Resize and Upsample to high precision + # NonMaxSuppression and Celu require FP32 inputs per ONNX standard. + # Upsample remains unsupported for low precision in AutoCast's binary classification. + # Resize is supported in low precision; inputs that must stay FP32 (e.g., scales, and roi for BF16) + # are handled via _should_skip_low_precision_input_conversion.
🧹 Nitpick comments (6)
tests/unit/onnx/autocast/test_precisionconverter.py (1)
1304-1324: Assert cast-up to FP32 for tensor-derived scales.When scales comes from a low-precision tensor, a Cast(to=FLOAT) should feed Resize input[2].
@@ def test_resize_op_tensor_scales_conversion( - onnx.checker.check_model(converted_model) + onnx.checker.check_model(converted_model) + + # Ensure Resize input[2] (scales) is FP32 via an inserted Cast + resize_node = next(n for n in converted_model.graph.node if n.name == "resize") + scales_in = resize_node.input[2] + cast_nodes = [ + n for n in converted_model.graph.node + if n.op_type == "Cast" and n.output and n.output[0] == scales_in + ] + assert len(cast_nodes) == 1 + cast_to_attr = next(a for a in cast_nodes[0].attribute if a.name == "to").i + assert cast_to_attr == TensorProto.FLOATThis ensures compliance with ONNX: scales must be tensor(float). (onnx.ai)
modelopt/onnx/autocast/precisionconverter.py (5)
425-471: LGTM with a tiny naming nit.Functionality looks correct. Consider renaming cast_to_fp16 → cast_to_low_precision for clarity.
674-675: Avoid mutable default argument.- def _add_cast( - self, tensor_name: str, cast_to: PrecisionTypes, exclude_consumers: list[str] = [] + def _add_cast( + self, tensor_name: str, cast_to: PrecisionTypes, exclude_consumers: list[str] | None = None ) -> None: @@ - consumer_nodes = utils.get_consumer_nodes(self.model, tensor_name) - consumer_nodes = [n for n in consumer_nodes if n.name not in exclude_consumers] + if exclude_consumers is None: + exclude_consumers = [] + consumer_nodes = utils.get_consumer_nodes(self.model, tensor_name) + consumer_nodes = [n for n in consumer_nodes if n.name not in exclude_consumers]
1098-1121: Don’t raise on missing/optional inputs; just skip.Optional inputs (e.g., empty strings) shouldn’t trigger ValueError; return False to keep flow robust.
- inputs_lst = list(node.input) - if input_name not in inputs_lst: - raise ValueError(f"Input {input_name} not found in node {node.name}.") + inputs_lst = list(node.input) + if input_name not in inputs_lst: + return False
371-385: Docstring return type incorrect for _get_node_initializers_map.- Returns: - dict[str, list[str]]: Mapping from node names to lists of initializer names. + Returns: + dict[str, list[onnx.TensorProto]]: Mapping from node names to lists of initializer protos.
128-152: Suggestion: document op_block_list workaround in docstrings.Given known accuracy sensitivities with Resize in some models, note in convert() docstring that callers can block ops via the high/low node lists, or a higher-level op_block_list, to mitigate regressions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modelopt/onnx/autocast/precisionconverter.py(7 hunks)tests/unit/onnx/autocast/test_precisionconverter.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/onnx/autocast/test_precisionconverter.py (1)
modelopt/onnx/autocast/precisionconverter.py (2)
_get_tensors_to_cast(425-470)convert(124-227)
🔇 Additional comments (7)
tests/unit/onnx/autocast/test_precisionconverter.py (4)
181-189: LGTM: updated tuple-unpack aligns with new _get_tensors_to_cast API.Also applies to: 191-195
317-327: LGTM: multiple-consumer test updated correctly for triple return.
1178-1224: Resize fixture (initializer scales): good coverage setup.
1226-1282: Resize fixture (tensor-derived scales): good scenario.modelopt/onnx/autocast/precisionconverter.py (3)
163-180: LGTM: cast-up exclusion logic correctly special-cases FP32 inputs of low-precision nodes.
50-54: LGTM: Resize removed from global low‑precision blocklist.
583-585: LGTM: Initializer conversion respects per-input FP32 skip.
|
/ok to test b9a8e3d |
|
@aboubezari can you look into the CI failures? |
@galagam My bad, it was a typo introduced when switching to a more general approach. Fixed now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)
1098-1121: Consider filtering empty input strings before calling this method.The implementation correctly checks input indices against skip mappings. However, for optional inputs (represented as empty strings in
node.input), the current flow processes them here, then filters them downstream in_is_castable_tensor(line 465-466) with a warning.For better efficiency, consider adding an early check in
_get_tensors_to_castat line 437:for input in node.input: + if not input: # Skip optional inputs not provided + continue if self._should_skip_low_precision_input_conversion(node, input):This avoids unnecessary processing and warning messages for empty inputs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modelopt/onnx/autocast/precisionconverter.py(7 hunks)
🔇 Additional comments (4)
modelopt/onnx/autocast/precisionconverter.py (4)
50-58: LGTM! Skip mappings correctly implement ONNX Resize spec constraints.The mappings correctly preserve FP32 precision for Resize inputs that don't support lower precision formats:
- FP16 mode: Only
scales(input 2) must remain FP32- BF16 mode: Both
roi(input 1) andscales(input 2) must remain FP32This aligns with the ONNX spec where
roisupports float16 but not bfloat16, andscalesonly supports float.
425-470: Excellent design for mixed-precision node support.The three-tuple return and
fp32_input_to_low_precision_nodetracking elegantly handle low-precision nodes (like Resize) that require specific FP32 inputs while operating in low precision. The logic correctly distinguishes between:
- FP32 inputs TO low-precision nodes (e.g., Resize scales) → tracked for special handling
- FP32 outputs FROM low-precision nodes → cast up for high-precision consumers
This enables per-input precision control as required by ONNX operator specs.
170-180: Correct handling of FP32 inputs to low-precision nodes.The exclusion logic properly ensures that when a low-precision node requires a specific FP32 input (e.g., Resize scales), that node is not excluded from consuming the cast-up tensor, while other low-precision nodes remain excluded. This maintains the correct mixed-precision flow.
583-585: Correct preservation of FP32 initializers for low-precision nodes.The skip logic ensures initializers that must remain FP32 (e.g., Resize scales) are not converted, even when the consuming node is classified as low-precision. The ONNX operator handles the mixed-precision inputs natively per its spec.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #436 +/- ##
==========================================
+ Coverage 73.38% 73.41% +0.03%
==========================================
Files 180 180
Lines 17986 18011 +25
==========================================
+ Hits 13199 13223 +24
- Misses 4787 4788 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Ali Boubezari <[email protected]> cleanup Signed-off-by: Ali Boubezari <[email protected]> Update modelopt/onnx/autocast/precisionconverter.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: aboubezari <[email protected]> [5571471] Fix quantization logic for residual branches with different backbones (NVIDIA#425) Signed-off-by: gcunhase <[email protected]> modify casts; testing Add support for tensor scales Signed-off-by: Ali Boubezari <[email protected]> Generalize & automate skipping inputs; only skip index 2 for bfloat16 Signed-off-by: Ali Boubezari <[email protected]> bugfixes Signed-off-by: Ali Boubezari <[email protected]>
Signed-off-by: Ali Boubezari <[email protected]>
81d53ba to
779f134
Compare
Signed-off-by: Gal Hubara Agam <[email protected]>
779f134 to
2c32af8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/unit/onnx/autocast/test_precisionconverter.py (1)
1285-1303: Add assertions to verify precision handling.The test only validates model structural integrity but doesn't verify the core functionality introduced in this PR: that
roiandscalesinputs to Resize remain FP32 while the data path is converted to low precision.Apply this diff to add the missing assertions:
converted_model = converter.convert( high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node] ) onnx.checker.check_model(converted_model) + + # Verify roi and scales remain FP32 (indices 1, 2 of Resize) + roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi") + scales_init = next(init for init in converted_model.graph.initializer if init.name == "scales") + assert roi_init.data_type == TensorProto.FLOAT, "roi should remain FP32" + assert scales_init.data_type == TensorProto.FLOAT, "scales should remain FP32" + + # Verify add_const is converted to low precision + add_const_init = next(init for init in converted_model.graph.initializer if init.name == "add_const") + assert add_const_init.data_type == low_precision_onnx_type(low_precision_type), \ + f"add_const should be converted to {low_precision_type}"Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modelopt/onnx/autocast/precisionconverter.py(7 hunks)tests/unit/onnx/autocast/test_precisionconverter.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/onnx/autocast/precisionconverter.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/onnx/autocast/test_precisionconverter.py (2)
modelopt/onnx/autocast/precisionconverter.py (2)
_get_tensors_to_cast(425-470)convert(124-227)modelopt/onnx/utils.py (1)
check_model(557-569)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (4)
tests/unit/onnx/autocast/test_precisionconverter.py (4)
182-182: LGTM!The unpacking changes correctly reflect the updated
_get_tensors_to_castsignature that now returns a triple instead of a pair.Also applies to: 191-191, 317-317, 323-323
1178-1224: LGTM!The fixture properly creates a test model with Resize using initializer-based scales, suitable for validating the autocast behavior.
1227-1282: LGTM!The fixture provides good test coverage by creating a scenario where Resize scales are computed dynamically from a network input rather than using a constant initializer.
830-856: ****Resize is not in the unsupported operations list. The
OP_TYPES_NOT_SUPPORTED_IN_LOW_PRECISIONat line 50 ofprecisionconverter.pycontains only["Upsample", "NonMaxSuppression", "Celu"](extended with["Scan", "If", "Loop", "LSTM"]).The test correctly allows "resize" to remain in
low_precision_nodeswhile filtering out "celu" and "nms". TheSKIP_LOW_PRECISION_MAPPING(lines 56-57) provides special input-index handling for Resize, but the operation itself is supported in low precision. No test changes are needed.Likely an incorrect or invalid review comment.
| @pytest.mark.parametrize("keep_io_types", [True, False]) | ||
| @pytest.mark.parametrize("low_precision_type", ["fp16", "bf16"]) | ||
| def test_resize_op_tensor_scales_conversion( | ||
| create_model_with_resize_op_tensor_scales, keep_io_types, low_precision_type | ||
| ): | ||
| model, value_info_map, initializer_map, node_to_init_map = ( | ||
| create_model_with_resize_op_tensor_scales | ||
| ) | ||
|
|
||
| converter = PrecisionConverter( | ||
| model, | ||
| value_info_map, | ||
| initializer_map, | ||
| node_to_init_map, | ||
| keep_io_types=keep_io_types, | ||
| low_precision_type=low_precision_type, | ||
| ) | ||
| converted_model = converter.convert( | ||
| high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node] | ||
| ) | ||
| onnx.checker.check_model(converted_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add assertions to verify tensor scales precision handling.
Similar to the initializer test, this test lacks assertions for the core functionality. With tensor-based scales, you should verify that the scales_tensor output feeding into Resize remains FP32, and that appropriate Cast nodes are inserted.
Apply this diff to add the missing assertions:
converted_model = converter.convert(
high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
)
onnx.checker.check_model(converted_model)
+
+ # Verify roi remains FP32
+ roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi")
+ assert roi_init.data_type == TensorProto.FLOAT, "roi should remain FP32"
+
+ # Verify scales_tensor (input to Resize) has Cast to FP32
+ # Since scales_add is in low_precision_nodes but feeds Resize which needs FP32 scales,
+ # there should be a Cast node inserted
+ cast_nodes = [n for n in converted_model.graph.node if n.op_type == "Cast"]
+ scales_cast = [c for c in cast_nodes if c.input[0] == "scales_tensor"]
+ assert len(scales_cast) > 0, "Cast node should be inserted for scales_tensor to Resize"
+ assert scales_cast[0].attribute[0].i == TensorProto.FLOAT, "scales_tensor should be cast to FP32"|
/ok to test 2c32af8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
What does this PR do?
Type of change: new feature
Overview: Add support for autocasting Resize ONNX nodes to fp16 or bfloat16.
Usage
Autocast with either fp16 or bfloat16
Testing
Added a unittest
Before your PR is "Ready for review"
Summary by CodeRabbit
Improvements
Bug Fixes
Tests