Skip to content

Conversation

@aboubezari
Copy link
Contributor

@aboubezari aboubezari commented Oct 14, 2025

What does this PR do?

Type of change: new feature

Overview: Add support for autocasting Resize ONNX nodes to fp16 or bfloat16.

Usage

Autocast with either fp16 or bfloat16

Testing

Added a unittest

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: No

Summary by CodeRabbit

  • Improvements

    • Low-precision conversion now supports per-input skip rules so specific inputs (for example, resize scales) can remain FP32; network I/O and initializer handling improved to avoid unsafe casts.
    • Resize is no longer globally excluded from low-precision conversion, enabling broader model support.
  • Bug Fixes

    • Prevents incorrect casting around Resize inputs, reducing failures and accuracy regressions when converting models to low precision.
  • Tests

    • New unit tests cover Resize scenarios (initializer- and tensor-based scales) and related conversion behaviors.

@aboubezari aboubezari requested a review from a team as a code owner October 14, 2025 20:16
@aboubezari aboubezari requested a review from galagam October 14, 2025 20:16
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 14, 2025

Walkthrough

Removes "Resize" from the global low-precision blacklist and introduces per-input skip mappings for Resize. Updates PrecisionConverter signatures and internal flows to track FP32 inputs for low-precision nodes, adjust initializer and cast insertion logic, and adds unit tests covering Resize scenarios.

Changes

Cohort / File(s) Summary
Precision converter logic
modelopt/onnx/autocast/precisionconverter.py
Removes "Resize" from OP_TYPES_NOT_SUPPORTED_IN_LOW_PRECISION. Adds SKIP_LOW_PRECISION_MAPPING_FP16 and SKIP_LOW_PRECISION_MAPPING_BF16. Changes PrecisionConverter.__init__ node_to_init_map type to dict[str, list[onnx.TensorProto]]. _get_tensors_to_cast now returns (cast_to_fp16, cast_to_fp32, fp32_input_to_low_precision_node), handles network IO, excludes initializers from cast lists, and filters castable tensors. Adds _should_skip_low_precision_input_conversion(node, input_name) and uses it in initializer conversion and cast insertion (adjusting exclude_consumers for cast-up tensors).
Unit tests for Resize handling
tests/unit/onnx/autocast/test_precisionconverter.py
Updates callers to unpack three values from _get_tensors_to_cast. Adds fixtures create_model_with_resize_op and create_model_with_resize_op_tensor_scales. Adds tests test_resize_op_initializer_conversion and test_resize_op_tensor_scales_conversion exercising initializer-based and tensor-based Resize scale cases across keep_io_types and low_precision_type permutations.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant PC as PrecisionConverter
  participant G as ONNX Graph
  participant N as Node
  participant R as Resize Node
  participant I as Initializer

  PC->>G: build node->initializers map (TensorProto lists)
  PC->>G: call _get_tensors_to_cast()
  G-->>PC: return (cast_to_fp16, cast_to_fp32, fp32_input_to_low_precision_node)
  alt per-input check for low-precision nodes
    PC->>R: for each input, call _should_skip_low_precision_input_conversion
    alt skip input
      PC->>PC: record input in fp32_input_to_low_precision_node
    else
      PC->>PC: mark input for low-precision cast
    end
  end
  PC->>I: _convert_initializers (skip inputs flagged to remain FP32)
  PC->>G: insert Cast nodes for cast_to_fp16/cast_to_fp32, exclude consumers per fp32_input mapping
  PC->>G: return updated model
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I hop through graphs with careful care,
I skip some Resize inputs with flair.
Some tensors stay bright FP32 light,
Others shrink to low-precision night.
A rabbit cheers the casts done right. 🐰

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "[Autocast] Add low precision autocasting support for Resize op" directly and accurately describes the main objective of the changeset. The core changes involve enabling Resize operations to be autocasted to low-precision formats (fp16 and bfloat16) by removing Resize from the unsupported operations list, adding selective input skip mappings, and updating the PrecisionConverter logic accordingly. The title is clear, concise, and specific—it identifies both what is being added (low precision autocasting support) and what it applies to (Resize op), allowing teammates to quickly understand the primary purpose of the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)

1056-1076: Consider using AssertionError instead of KeyError.

At lines 1064-1066, the code raises KeyError if the initializer is not found in the node's inputs. Since this check validates an internal invariant (initializers in node_to_init_map[node.name] should always be in node.input), consider using AssertionError or ValueError instead to better signal a programming error rather than a missing key.

Example:

         if init.name not in node.input:
-            raise KeyError(
-                f"Initializer {init.name} not found in node {node.name} input, not expected!"
-            )
+            raise AssertionError(
+                f"Initializer {init.name} not found in node {node.name} input - internal invariant violated"
+            )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4df4091 and ff2dbf2.

📒 Files selected for processing (2)
  • modelopt/onnx/autocast/precisionconverter.py (3 hunks)
  • tests/unit/onnx/autocast/test_precisionconverter.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/onnx/autocast/test_precisionconverter.py (2)
modelopt/onnx/utils.py (1)
  • check_model (557-569)
modelopt/onnx/autocast/precisionconverter.py (1)
  • convert (113-205)
🔇 Additional comments (2)
modelopt/onnx/autocast/precisionconverter.py (1)

551-552: LGTM!

The integration of the skip-check correctly prevents conversion of Resize initializers at specific indices before attempting the conversion.

tests/unit/onnx/autocast/test_precisionconverter.py (1)

1109-1159: LGTM!

The fixture creates a well-structured model with a Resize operation and appropriate initializers to test the new conversion logic.

Comment on lines +1162 to +1303
@pytest.mark.parametrize("keep_io_types", [True, False])
@pytest.mark.parametrize("low_precision_type", ["fp16", "bf16"])
def test_resize_op_initializer_conversion(
create_model_with_resize_op, keep_io_types, low_precision_type
):
model, value_info_map, initializer_map, node_to_init_map = create_model_with_resize_op

converter = PrecisionConverter(
model,
value_info_map,
initializer_map,
node_to_init_map,
keep_io_types=keep_io_types,
low_precision_type=low_precision_type,
)
converted_model = converter.convert(
high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
)
onnx.checker.check_model(converted_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add assertions to verify initializer precision handling.

The test validates that the model is valid after conversion, but it doesn't verify the core functionality: that roi and scales initializers remain FP32 while add_const is converted to low precision. This leaves a gap in test coverage.

Consider adding assertions like:

     converted_model = converter.convert(
         high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
     )
     onnx.checker.check_model(converted_model)
+
+    # Verify roi and scales remain FP32 (indices 1, 2 of Resize)
+    roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi")
+    scales_init = next(init for init in converted_model.graph.initializer if init.name == "scales")
+    assert roi_init.data_type == TensorProto.FLOAT
+    assert scales_init.data_type == TensorProto.FLOAT
+
+    # Verify add_const is converted to low precision
+    add_const_init = next(init for init in converted_model.graph.initializer if init.name == "add_const")
+    assert add_const_init.data_type == low_precision_onnx_type(low_precision_type)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.parametrize("keep_io_types", [True, False])
@pytest.mark.parametrize("low_precision_type", ["fp16", "bf16"])
def test_resize_op_initializer_conversion(
create_model_with_resize_op, keep_io_types, low_precision_type
):
model, value_info_map, initializer_map, node_to_init_map = create_model_with_resize_op
converter = PrecisionConverter(
model,
value_info_map,
initializer_map,
node_to_init_map,
keep_io_types=keep_io_types,
low_precision_type=low_precision_type,
)
converted_model = converter.convert(
high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
)
onnx.checker.check_model(converted_model)
@pytest.mark.parametrize("keep_io_types", [True, False])
@pytest.mark.parametrize("low_precision_type", ["fp16", "bf16"])
def test_resize_op_initializer_conversion(
create_model_with_resize_op, keep_io_types, low_precision_type
):
model, value_info_map, initializer_map, node_to_init_map = create_model_with_resize_op
converter = PrecisionConverter(
model,
value_info_map,
initializer_map,
node_to_init_map,
keep_io_types=keep_io_types,
low_precision_type=low_precision_type,
)
converted_model = converter.convert(
high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
)
onnx.checker.check_model(converted_model)
# Verify roi and scales remain FP32 (indices 1, 2 of Resize)
roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi")
scales_init = next(init for init in converted_model.graph.initializer if init.name == "scales")
assert roi_init.data_type == TensorProto.FLOAT
assert scales_init.data_type == TensorProto.FLOAT
# Verify add_const is converted to low precision
add_const_init = next(
init for init in converted_model.graph.initializer if init.name == "add_const"
)
assert add_const_init.data_type == low_precision_onnx_type(low_precision_type)

for node in self.model.graph.node:
if node.name in low_precision_nodes:
for init in self.node_to_init_map[node.name]:
if self._should_skip_low_precision_initializer_conversion(node, init):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aboubezari This PR assumes that the scales input is an initializer, but it doesn't have to be.
We'll also need to add a check for dynamic input, maybe in _get_tensors_to_cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galagam You're right. I've tweaked the logic in _get_tensors_to_cast to avoid adding those casts, and tracked some extra metadata to make sure the casts get applied properly to these special mixed-precision input nodes.

@aboubezari aboubezari requested a review from galagam October 15, 2025 20:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)

83-83: Update docstring to match type annotation.

The docstring states "lists of initializer names" but the type annotation shows list[onnx.TensorProto]. The implementation at line 569 confirms TensorProto objects are expected.

Apply this diff:

-            node_to_init_map: Map of node names to lists of initializer names.
+            node_to_init_map: Map of node names to lists of initializers.
♻️ Duplicate comments (2)
modelopt/onnx/autocast/precisionconverter.py (1)

49-49: LGTM: Resize correctly removed from unsupported ops.

With the new per-input precision handling, Resize can now be supported in low precision while keeping its roi and scales inputs in FP32 as required.

Note: A past review comment already flagged that the comment at lines 395-398 needs updating to reflect this change.

tests/unit/onnx/autocast/test_precisionconverter.py (1)

1213-1231: Add assertions to verify precision handling.

Both test functions validate model correctness but don't verify the core functionality: that roi/scales remain FP32 while other initializers/tensors are converted to low precision. A past review comment already identified this gap and provided specific assertion suggestions.

Also applies to: 1234-1254

🧹 Nitpick comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)

1075-1097: LGTM: Resize-specific input handling is correctly implemented.

The method properly identifies Resize inputs at indices 1 and 2 (roi and scales) that must remain in FP32. The conservative approach of keeping both in FP32 is safe, though the ONNX spec indicates roi could support low precision.

Consider clarifying the comment at line 1092-1093 to be more precise:

-            # The second input does not support bfloat16, so leave it in FP32.
-            # The third input of Resize must remain in FP32.
+            # The second input (roi) and third input (scales) of Resize should remain in FP32.
+            # While roi may support low precision per spec, scales must be FP32, so we keep both FP32 for safety.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53ebcaa and f3bcfd1.

📒 Files selected for processing (2)
  • modelopt/onnx/autocast/precisionconverter.py (7 hunks)
  • tests/unit/onnx/autocast/test_precisionconverter.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/onnx/autocast/test_precisionconverter.py (2)
modelopt/onnx/autocast/precisionconverter.py (2)
  • _get_tensors_to_cast (412-457)
  • convert (113-214)
modelopt/onnx/utils.py (1)
  • check_model (557-569)
🔇 Additional comments (7)
modelopt/onnx/autocast/precisionconverter.py (4)

150-167: LGTM: Per-input casting logic is well-implemented.

The logic correctly handles the case where low-precision nodes require specific FP32 inputs by adjusting the exclude_consumers list. The tensor is cast to FP32 for nodes that need it while remaining low-precision for others.


412-457: LGTM: Enhanced tensor casting logic with per-input precision control.

The method now correctly identifies which inputs to low-precision nodes must remain in FP32 (like Resize's roi and scales inputs) and tracks the associated nodes. The implementation properly separates casting decisions per input rather than per node.


570-571: LGTM: Initializer conversion correctly respects per-input requirements.

The guard properly skips converting initializers that must remain in FP32 (e.g., Resize's roi and scales).


71-71: Docstring update only: node_to_init_map holds lists of onnx.TensorProto. No external API break—all internal callers (including tests) already pass TensorProto lists. Update the constructor docstring to reflect the new type.

Likely an incorrect or invalid review comment.

tests/unit/onnx/autocast/test_precisionconverter.py (3)

180-180: LGTM: Test calls correctly updated for new API.

All calls to _get_tensors_to_cast properly unpack three values, discarding the new fp32_input_to_low_precision_node mapping with _ since these tests don't need it.

Also applies to: 189-189, 315-315, 321-321


1106-1153: LGTM: Well-structured test fixture for Resize with initializer inputs.

The fixture creates a realistic model with Resize in a computation pipeline, using initializer-based roi and scales. The empty roi array is a valid edge case per ONNX spec.


1155-1210: LGTM: Fixture addresses dynamic scales concern.

This fixture correctly tests the scenario raised by galagam where scales come from a tensor (computed at runtime) rather than an initializer. This ensures the implementation handles both static and dynamic Resize inputs.

@aboubezari aboubezari requested a review from galagam October 16, 2025 17:14
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3bcfd1 and 6262096.

📒 Files selected for processing (1)
  • modelopt/onnx/autocast/precisionconverter.py (7 hunks)
🔇 Additional comments (5)
modelopt/onnx/autocast/precisionconverter.py (5)

49-56: LGTM! Clean removal of Resize from blacklist with proper skip mappings.

The changes correctly remove "Resize" from the unsupported ops list and introduce per-input skip mappings that preserve FP32 for specific Resize inputs (scales at index 1 for both FP16/BF16, and additionally index 2 for BF16). This is consistent with ONNX spec requirements and the PR objectives.


75-75: LGTM! Type signature now matches implementation.

The parameter type change from list[str] to list[onnx.TensorProto] accurately reflects the actual usage throughout the code, particularly in _get_node_initializers_map and _convert_initializers.


154-171: LGTM! Correct handling of selective FP32 inputs to low-precision nodes.

The updated logic properly manages the case where specific inputs to low-precision nodes must remain FP32 (e.g., Resize scales input). The exclude_consumers adjustment ensures that cast operations are correctly applied only where needed.


416-461: LGTM! _get_tensors_to_cast correctly implements per-input skip logic.

The method properly iterates through low-precision node inputs, checks which should remain FP32 via _should_skip_low_precision_input_conversion, and builds the appropriate cast lists. The triple return value and network IO handling are correctly implemented.


574-575: LGTM! Initializer conversion correctly respects skip logic.

The skip check properly prevents conversion of initializers that must remain FP32, maintaining consistency with the overall precision handling strategy.

Comment on lines 1079 to 1121
def _should_skip_low_precision_input_conversion(
self, node: onnx.NodeProto, input_name: str
) -> bool:
"""Check if the input should be skipped for low precision conversion.
This is used for nodes that have inputs that MUST remain in FP32.
"""
match self.low_precision_type:
case "fp16":
skip_inputs_map = SKIP_LOW_PRECISION_MAPPING_FP16
case "bf16":
skip_inputs_map = SKIP_LOW_PRECISION_MAPPING_BF16
case _:
raise ValueError(f"Unsupported low precision type: {self.low_precision_type}")

if node.op_type in skip_inputs_map:
# Figure out the index of the input in the node input
inputs_lst = list(node.input)
input_index = inputs_lst.index(input_name)
# Check if we should skip this input for low precision conversion
return input_index in skip_inputs_map[node.op_type]
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: Match statement compares PrecisionTypes object against string literals.

At line 1086, the match statement compares self.low_precision_type (a PrecisionTypes namedtuple set at line 106) against string cases "fp16" and "bf16". This will never match, causing the function to always raise ValueError at line 1092.

Apply this diff to fix the match statement:

-        match self.low_precision_type:
-            case "fp16":
+        match self.low_precision_type.str_short:
+            case "fp16":
                 skip_inputs_map = SKIP_LOW_PRECISION_MAPPING_FP16
-            case "bf16":
+            case "bf16":
                 skip_inputs_map = SKIP_LOW_PRECISION_MAPPING_BF16
             case _:
                 raise ValueError(f"Unsupported low precision type: {self.low_precision_type}")

Additionally, at line 1097, inputs_lst.index(input_name) will raise ValueError if input_name is not in the node's inputs (e.g., empty string for optional inputs). Consider adding a membership check:

         if node.op_type in skip_inputs_map:
             # Figure out the index of the input in the node input
             inputs_lst = list(node.input)
+            if input_name not in inputs_lst:
+                return False
             input_index = inputs_lst.index(input_name)
             # Check if we should skip this input for low precision conversion
             return input_index in skip_inputs_map[node.op_type]
         return False
🤖 Prompt for AI Agents
In modelopt/onnx/autocast/precisionconverter.py around lines 1079-1100, the
match on self.low_precision_type uses string literals ("fp16"/"bf16") but
self.low_precision_type is a PrecisionTypes object, causing the ValueError path
to always run; change the match/cases to compare against the PrecisionTypes
members (e.g., PrecisionTypes.fp16 and PrecisionTypes.bf16) or use the
enum's/value's string (e.g., self.low_precision_type.value == "fp16"), and
before calling inputs_lst.index(input_name) add a membership check (if
input_name in inputs_lst) and return False when not present to avoid raising
ValueError for optional/empty inputs.

@galagam
Copy link
Contributor

galagam commented Oct 19, 2025

@i-riyad @ajrasane @gcunhase please note that this could cause some accuracy regressions in the quantization workflow.
Previously, Resize was explicitly excluded from fp16 with the onnxconverter-common flow, and until now implicitly excluded within PrecisionConverter.
With this change, we may start seeing fp16 Resize nodes, and I suspect this will impact accuracy.
If it causes an accuracy degradation in quantization workflows we can explicitly exclude it again, like this:

onnx_model = convert_to_f16(
             ...
            op_block_list=["Resize"],
             ...
)            

@gcunhase
Copy link
Contributor

@i-riyad @ajrasane @gcunhase please note that this could cause some accuracy regressions in the quantization workflow. Previously, Resize was explicitly excluded from fp16 with the onnxconverter-common flow, and until now implicitly excluded within PrecisionConverter. With this change, we may start seeing fp16 Resize nodes, and I suspect this will impact accuracy. If it causes an accuracy degradation in quantization workflows we can explicitly exclude it again, like this:

onnx_model = convert_to_f16(
             ...
            op_block_list=["Resize"],
             ...
)            

@i-riyad @ajrasane do you remember which model was having accuracy drops when converting Resize to FP16?

@i-riyad
Copy link
Contributor

i-riyad commented Oct 20, 2025

@i-riyad @ajrasane do you remember which model was having accuracy drops when converting Resize to FP16?

I do not remember particularly, probably vit_large and/or swin.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)

54-56: Add inline documentation for the skip mappings.

The constant mappings SKIP_LOW_PRECISION_MAPPING_FP16 and SKIP_LOW_PRECISION_MAPPING_BF16 would benefit from inline comments explaining which inputs each index represents for each operator. For example:

# Mapping of op types to indices of inputs that should not be converted to low precision.
# Resize inputs: 0=data, 1=roi/scales (must be FP32), 2=sizes (must be FP32 for bf16)
SKIP_LOW_PRECISION_MAPPING_FP16 = {"Resize": {1}}
SKIP_LOW_PRECISION_MAPPING_BF16 = {"Resize": {1, 2}}

This will make it easier to maintain and extend these mappings in the future.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6262096 and 02f729c.

📒 Files selected for processing (1)
  • modelopt/onnx/autocast/precisionconverter.py (7 hunks)
🔇 Additional comments (6)
modelopt/onnx/autocast/precisionconverter.py (6)

75-75: LGTM: Type annotation correction.

The type correction from list[str] to list[onnx.TensorProto] accurately reflects the actual usage of this parameter throughout the codebase.


154-171: LGTM: Correct handling of mixed-precision inputs.

The logic correctly handles the case where a low-precision node requires specific inputs to remain in FP32. By selectively excluding consumers based on the fp32_input_to_low_precision_node mapping, the casting logic ensures that FP32 inputs reach the nodes that need them while still casting appropriately for other consumers.


416-461: LGTM: Per-input skip logic correctly implemented.

The updated _get_tensors_to_cast method correctly implements per-input precision handling. The logic properly categorizes inputs that must remain FP32 (via _should_skip_low_precision_input_conversion) while adding them to the appropriate cast lists and tracking the relationship in fp32_input_to_low_precision_node. Empty/optional inputs are correctly filtered by the _is_castable_tensor check.


574-575: LGTM: Initializer conversion correctly respects skip logic.

The check ensures that initializers feeding FP32-required inputs are not converted to low precision, maintaining consistency with the tensor casting logic.


1079-1102: LGTM: Skip logic correctly implemented.

The method correctly determines whether a specific input should remain in FP32 based on the operator type and input index. The match statement properly uses .str_short to select the appropriate mapping, and the membership check at line 1097 ensures the input belongs to the node before checking its index.

The ValueError at line 1098 is a good defensive check, though it should never occur in normal operation since callers iterate over node.input or use initializer names that are also in node.input.


49-49: Verify accuracy impact on vision models.

Based on the PR comments, enabling fp16/bfloat16 for Resize may cause accuracy regressions in certain quantization workflows. The reviewers specifically mentioned potential issues with vit_large and swin models.

While the implementation correctly handles the ONNX specification (keeping scales/sizes inputs in FP32), ensure comprehensive accuracy testing is performed on the models that previously showed regressions when Resize was converted to FP16. Users can still explicitly block Resize conversion if needed using op_block_list=["Resize"].

Based on PR objectives and review comments.

@aboubezari aboubezari force-pushed the autocast_resize_fp16_support branch from 02f729c to b9a8e3d Compare October 21, 2025 01:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)

71-77: Docstring mismatch: node_to_init_map holds TensorProto objects, not names.

-            node_to_init_map: Map of node names to lists of initializer names.
+            node_to_init_map: Map of node names to lists of onnx.TensorProto initializers.
♻️ Duplicate comments (2)
tests/unit/onnx/autocast/test_precisionconverter.py (1)

1284-1303: Add dtype assertions for roi/scales/add_const to validate core behavior.

Current test only checks model validity; add precise dtype checks to guard against regressions.

@@ def test_resize_op_initializer_conversion(
-    onnx.checker.check_model(converted_model)
+    onnx.checker.check_model(converted_model)
+
+    # Verify roi/scales dtypes per ONNX Resize spec
+    roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi")
+    scales_init = next(init for init in converted_model.graph.initializer if init.name == "scales")
+    # scales must remain FP32
+    assert scales_init.data_type == TensorProto.FLOAT
+    # roi: BF16 unsupported; FP16 allowed by spec. Accept FP32/FP16 for fp16 path, require FP32 for bf16 path.
+    if low_precision_type == "bf16":
+        assert roi_init.data_type == TensorProto.FLOAT
+    else:
+        assert roi_init.data_type in (TensorProto.FLOAT, TensorProto.FLOAT16)
+
+    # Verify data initializer got converted to low precision
+    add_const_init = next(init for init in converted_model.graph.initializer if init.name == "add_const")
+    assert add_const_init.data_type == low_precision_onnx_type(low_precision_type)

Reference: ONNX Resize input types — scales: tensor(float), roi: tensor(float16|float|double). (onnx.ai)

modelopt/onnx/autocast/precisionconverter.py (1)

405-423: Update outdated comment: Resize no longer blanket-blocked.

Comment still claims both Resize and Upsample must be high precision. With per-input skipping, only Upsample remains globally blocked; Resize is supported with selective FP32 inputs.

-        # NonMaxSuppression and Celu require FP32 inputs per ONNX standard
-        # Resize and Upsample allow the data input (index 0) to be FP16/BF16 per ONNX standard, but require the scale
-        # input (index 1) to be FP32. However, AutoCast requires a binary classification for each node: high/low
-        # precision so we need to set Resize and Upsample to high precision
+        # NonMaxSuppression and Celu require FP32 inputs per ONNX standard.
+        # Upsample remains unsupported for low precision in AutoCast's binary classification.
+        # Resize is supported in low precision; inputs that must stay FP32 (e.g., scales, and roi for BF16)
+        # are handled via _should_skip_low_precision_input_conversion.
🧹 Nitpick comments (6)
tests/unit/onnx/autocast/test_precisionconverter.py (1)

1304-1324: Assert cast-up to FP32 for tensor-derived scales.

When scales comes from a low-precision tensor, a Cast(to=FLOAT) should feed Resize input[2].

@@ def test_resize_op_tensor_scales_conversion(
-    onnx.checker.check_model(converted_model)
+    onnx.checker.check_model(converted_model)
+
+    # Ensure Resize input[2] (scales) is FP32 via an inserted Cast
+    resize_node = next(n for n in converted_model.graph.node if n.name == "resize")
+    scales_in = resize_node.input[2]
+    cast_nodes = [
+        n for n in converted_model.graph.node
+        if n.op_type == "Cast" and n.output and n.output[0] == scales_in
+    ]
+    assert len(cast_nodes) == 1
+    cast_to_attr = next(a for a in cast_nodes[0].attribute if a.name == "to").i
+    assert cast_to_attr == TensorProto.FLOAT

This ensures compliance with ONNX: scales must be tensor(float). (onnx.ai)

modelopt/onnx/autocast/precisionconverter.py (5)

425-471: LGTM with a tiny naming nit.

Functionality looks correct. Consider renaming cast_to_fp16 → cast_to_low_precision for clarity.


674-675: Avoid mutable default argument.

-    def _add_cast(
-        self, tensor_name: str, cast_to: PrecisionTypes, exclude_consumers: list[str] = []
+    def _add_cast(
+        self, tensor_name: str, cast_to: PrecisionTypes, exclude_consumers: list[str] | None = None
     ) -> None:
@@
-        consumer_nodes = utils.get_consumer_nodes(self.model, tensor_name)
-        consumer_nodes = [n for n in consumer_nodes if n.name not in exclude_consumers]
+        if exclude_consumers is None:
+            exclude_consumers = []
+        consumer_nodes = utils.get_consumer_nodes(self.model, tensor_name)
+        consumer_nodes = [n for n in consumer_nodes if n.name not in exclude_consumers]

1098-1121: Don’t raise on missing/optional inputs; just skip.

Optional inputs (e.g., empty strings) shouldn’t trigger ValueError; return False to keep flow robust.

-            inputs_lst = list(node.input)
-            if input_name not in inputs_lst:
-                raise ValueError(f"Input {input_name} not found in node {node.name}.")
+            inputs_lst = list(node.input)
+            if input_name not in inputs_lst:
+                return False

371-385: Docstring return type incorrect for _get_node_initializers_map.

-        Returns:
-            dict[str, list[str]]: Mapping from node names to lists of initializer names.
+        Returns:
+            dict[str, list[onnx.TensorProto]]: Mapping from node names to lists of initializer protos.

128-152: Suggestion: document op_block_list workaround in docstrings.

Given known accuracy sensitivities with Resize in some models, note in convert() docstring that callers can block ops via the high/low node lists, or a higher-level op_block_list, to mitigate regressions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02f729c and b9a8e3d.

📒 Files selected for processing (2)
  • modelopt/onnx/autocast/precisionconverter.py (7 hunks)
  • tests/unit/onnx/autocast/test_precisionconverter.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/onnx/autocast/test_precisionconverter.py (1)
modelopt/onnx/autocast/precisionconverter.py (2)
  • _get_tensors_to_cast (425-470)
  • convert (124-227)
🔇 Additional comments (7)
tests/unit/onnx/autocast/test_precisionconverter.py (4)

181-189: LGTM: updated tuple-unpack aligns with new _get_tensors_to_cast API.

Also applies to: 191-195


317-327: LGTM: multiple-consumer test updated correctly for triple return.


1178-1224: Resize fixture (initializer scales): good coverage setup.


1226-1282: Resize fixture (tensor-derived scales): good scenario.

modelopt/onnx/autocast/precisionconverter.py (3)

163-180: LGTM: cast-up exclusion logic correctly special-cases FP32 inputs of low-precision nodes.


50-54: LGTM: Resize removed from global low‑precision blocklist.


583-585: LGTM: Initializer conversion respects per-input FP32 skip.

@galagam
Copy link
Contributor

galagam commented Oct 21, 2025

/ok to test b9a8e3d

@galagam
Copy link
Contributor

galagam commented Oct 21, 2025

@aboubezari can you look into the CI failures?
https://github.com/NVIDIA/TensorRT-Model-Optimizer/actions/runs/18669789421/job/53241035008?pr=436

FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_unsupported_op_types_model
FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_resize_op_initializer_conversion[fp16-True]
FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_resize_op_initializer_conversion[fp16-False]
FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_resize_op_tensor_scales_conversion[fp16-True]
FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_resize_op_tensor_scales_conversion[fp16-False]

@aboubezari
Copy link
Contributor Author

@aboubezari can you look into the CI failures? https://github.com/NVIDIA/TensorRT-Model-Optimizer/actions/runs/18669789421/job/53241035008?pr=436

FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_unsupported_op_types_model
FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_resize_op_initializer_conversion[fp16-True]
FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_resize_op_initializer_conversion[fp16-False]
FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_resize_op_tensor_scales_conversion[fp16-True]
FAILED tests/unit/onnx/autocast/test_precisionconverter.py::test_resize_op_tensor_scales_conversion[fp16-False]

@galagam My bad, it was a typo introduced when switching to a more general approach. Fixed now!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)

1098-1121: Consider filtering empty input strings before calling this method.

The implementation correctly checks input indices against skip mappings. However, for optional inputs (represented as empty strings in node.input), the current flow processes them here, then filters them downstream in _is_castable_tensor (line 465-466) with a warning.

For better efficiency, consider adding an early check in _get_tensors_to_cast at line 437:

 for input in node.input:
+    if not input:  # Skip optional inputs not provided
+        continue
     if self._should_skip_low_precision_input_conversion(node, input):

This avoids unnecessary processing and warning messages for empty inputs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9a8e3d and 81d53ba.

📒 Files selected for processing (1)
  • modelopt/onnx/autocast/precisionconverter.py (7 hunks)
🔇 Additional comments (4)
modelopt/onnx/autocast/precisionconverter.py (4)

50-58: LGTM! Skip mappings correctly implement ONNX Resize spec constraints.

The mappings correctly preserve FP32 precision for Resize inputs that don't support lower precision formats:

  • FP16 mode: Only scales (input 2) must remain FP32
  • BF16 mode: Both roi (input 1) and scales (input 2) must remain FP32

This aligns with the ONNX spec where roi supports float16 but not bfloat16, and scales only supports float.


425-470: Excellent design for mixed-precision node support.

The three-tuple return and fp32_input_to_low_precision_node tracking elegantly handle low-precision nodes (like Resize) that require specific FP32 inputs while operating in low precision. The logic correctly distinguishes between:

  • FP32 inputs TO low-precision nodes (e.g., Resize scales) → tracked for special handling
  • FP32 outputs FROM low-precision nodes → cast up for high-precision consumers

This enables per-input precision control as required by ONNX operator specs.


170-180: Correct handling of FP32 inputs to low-precision nodes.

The exclusion logic properly ensures that when a low-precision node requires a specific FP32 input (e.g., Resize scales), that node is not excluded from consuming the cast-up tensor, while other low-precision nodes remain excluded. This maintains the correct mixed-precision flow.


583-585: Correct preservation of FP32 initializers for low-precision nodes.

The skip logic ensures initializers that must remain FP32 (e.g., Resize scales) are not converted, even when the consuming node is classified as low-precision. The ONNX operator handles the mixed-precision inputs natively per its spec.

@codecov
Copy link

codecov bot commented Oct 22, 2025

Codecov Report

❌ Patch coverage is 96.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.41%. Comparing base (7ccaa53) to head (2c32af8).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/autocast/precisionconverter.py 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #436      +/-   ##
==========================================
+ Coverage   73.38%   73.41%   +0.03%     
==========================================
  Files         180      180              
  Lines       17986    18011      +25     
==========================================
+ Hits        13199    13223      +24     
- Misses       4787     4788       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Ali Boubezari <[email protected]>

cleanup

Signed-off-by: Ali Boubezari <[email protected]>

Update modelopt/onnx/autocast/precisionconverter.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: aboubezari <[email protected]>

[5571471] Fix quantization logic for residual branches with different backbones (NVIDIA#425)

Signed-off-by: gcunhase <[email protected]>

modify casts; testing

Add support for tensor scales

Signed-off-by: Ali Boubezari <[email protected]>

Generalize & automate skipping inputs; only skip index 2 for bfloat16

Signed-off-by: Ali Boubezari <[email protected]>

bugfixes

Signed-off-by: Ali Boubezari <[email protected]>
@galagam galagam force-pushed the autocast_resize_fp16_support branch from 81d53ba to 779f134 Compare October 22, 2025 11:43
Signed-off-by: Gal Hubara Agam <[email protected]>
@galagam galagam force-pushed the autocast_resize_fp16_support branch from 779f134 to 2c32af8 Compare October 22, 2025 11:44
@galagam galagam enabled auto-merge (squash) October 22, 2025 11:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/unit/onnx/autocast/test_precisionconverter.py (1)

1285-1303: Add assertions to verify precision handling.

The test only validates model structural integrity but doesn't verify the core functionality introduced in this PR: that roi and scales inputs to Resize remain FP32 while the data path is converted to low precision.

Apply this diff to add the missing assertions:

     converted_model = converter.convert(
         high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
     )
     onnx.checker.check_model(converted_model)
+
+    # Verify roi and scales remain FP32 (indices 1, 2 of Resize)
+    roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi")
+    scales_init = next(init for init in converted_model.graph.initializer if init.name == "scales")
+    assert roi_init.data_type == TensorProto.FLOAT, "roi should remain FP32"
+    assert scales_init.data_type == TensorProto.FLOAT, "scales should remain FP32"
+
+    # Verify add_const is converted to low precision
+    add_const_init = next(init for init in converted_model.graph.initializer if init.name == "add_const")
+    assert add_const_init.data_type == low_precision_onnx_type(low_precision_type), \
+        f"add_const should be converted to {low_precision_type}"

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81d53ba and 2c32af8.

📒 Files selected for processing (2)
  • modelopt/onnx/autocast/precisionconverter.py (7 hunks)
  • tests/unit/onnx/autocast/test_precisionconverter.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/autocast/precisionconverter.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/onnx/autocast/test_precisionconverter.py (2)
modelopt/onnx/autocast/precisionconverter.py (2)
  • _get_tensors_to_cast (425-470)
  • convert (124-227)
modelopt/onnx/utils.py (1)
  • check_model (557-569)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (4)
tests/unit/onnx/autocast/test_precisionconverter.py (4)

182-182: LGTM!

The unpacking changes correctly reflect the updated _get_tensors_to_cast signature that now returns a triple instead of a pair.

Also applies to: 191-191, 317-317, 323-323


1178-1224: LGTM!

The fixture properly creates a test model with Resize using initializer-based scales, suitable for validating the autocast behavior.


1227-1282: LGTM!

The fixture provides good test coverage by creating a scenario where Resize scales are computed dynamically from a network input rather than using a constant initializer.


830-856: ****

Resize is not in the unsupported operations list. The OP_TYPES_NOT_SUPPORTED_IN_LOW_PRECISION at line 50 of precisionconverter.py contains only ["Upsample", "NonMaxSuppression", "Celu"] (extended with ["Scan", "If", "Loop", "LSTM"]).

The test correctly allows "resize" to remain in low_precision_nodes while filtering out "celu" and "nms". The SKIP_LOW_PRECISION_MAPPING (lines 56-57) provides special input-index handling for Resize, but the operation itself is supported in low precision. No test changes are needed.

Likely an incorrect or invalid review comment.

Comment on lines +1306 to +1326
@pytest.mark.parametrize("keep_io_types", [True, False])
@pytest.mark.parametrize("low_precision_type", ["fp16", "bf16"])
def test_resize_op_tensor_scales_conversion(
create_model_with_resize_op_tensor_scales, keep_io_types, low_precision_type
):
model, value_info_map, initializer_map, node_to_init_map = (
create_model_with_resize_op_tensor_scales
)

converter = PrecisionConverter(
model,
value_info_map,
initializer_map,
node_to_init_map,
keep_io_types=keep_io_types,
low_precision_type=low_precision_type,
)
converted_model = converter.convert(
high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
)
onnx.checker.check_model(converted_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add assertions to verify tensor scales precision handling.

Similar to the initializer test, this test lacks assertions for the core functionality. With tensor-based scales, you should verify that the scales_tensor output feeding into Resize remains FP32, and that appropriate Cast nodes are inserted.

Apply this diff to add the missing assertions:

     converted_model = converter.convert(
         high_precision_nodes=[], low_precision_nodes=[node.name for node in model.graph.node]
     )
     onnx.checker.check_model(converted_model)
+
+    # Verify roi remains FP32
+    roi_init = next(init for init in converted_model.graph.initializer if init.name == "roi")
+    assert roi_init.data_type == TensorProto.FLOAT, "roi should remain FP32"
+
+    # Verify scales_tensor (input to Resize) has Cast to FP32
+    # Since scales_add is in low_precision_nodes but feeds Resize which needs FP32 scales,
+    # there should be a Cast node inserted
+    cast_nodes = [n for n in converted_model.graph.node if n.op_type == "Cast"]
+    scales_cast = [c for c in cast_nodes if c.input[0] == "scales_tensor"]
+    assert len(scales_cast) > 0, "Cast node should be inserted for scales_tensor to Resize"
+    assert scales_cast[0].attribute[0].i == TensorProto.FLOAT, "scales_tensor should be cast to FP32"

@galagam
Copy link
Contributor

galagam commented Oct 22, 2025

/ok to test 2c32af8

Copy link
Contributor

@galagam galagam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@galagam galagam merged commit d0e83ed into NVIDIA:main Oct 22, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants