Skip to content

Conversation

ajrasane
Copy link
Contributor

@ajrasane ajrasane commented Sep 24, 2025

What does this PR do?

Type of change:
New Feature

Overview:

  • Implemented convert_fp64_to_fp32() in graph sanitizer.
  • This method will be called with GraphSanitizer.sanitize() or convert_to_f16().

Testing

python -m modelopt.onnx.quantization --onnx_path=far3d_opset17.onnx  --calibration_eps cuda:0
  • I/O sanity check passes for this model
  • All unit tests pass

Summary by CodeRabbit

  • Documentation

    • Updated user instructions to include environment variable setup for CUDA/cuDNN libraries when using the TensorRT Docker image.
  • New Features

    • Added automatic conversion of 64-bit floating point (FP64) tensors to 32-bit (FP32) in ONNX model graphs, enhancing compatibility and standardization during processing.
  • Tests

    • Introduced new tests to validate correct conversion of FP64 values to FP32 across various graph components and scenarios.

@ajrasane ajrasane requested a review from cjluo-nv September 24, 2025 21:00
@ajrasane ajrasane self-assigned this Sep 24, 2025
@ajrasane ajrasane requested a review from a team as a code owner September 24, 2025 21:00
@ajrasane ajrasane requested a review from i-riyad September 24, 2025 21:00
Copy link

coderabbitai bot commented Sep 24, 2025

Walkthrough

Adds an FP64-to-FP32 normalization pass to the ONNX GraphSanitizer and invokes it during autocast conversion. Updates unit tests to cover the new conversions across initializers, IO types, and node kinds. Documentation for ONNX PTQ examples adds environment variable exports for CUDA/cuDNN in TensorRT Docker usage.

Changes

Cohort / File(s) Change Summary
ONNX Autocast Sanitizer core
modelopt/onnx/autocast/graphsanitizer.py
Adds FP64→FP32 conversion pass: initializers, IO types, and node handling (Cast, Constant, ConstantOfShape); integrates into sanitize; triggers shape inference post-conversion.
ONNX Autocast conversion entrypoint
modelopt/onnx/autocast/convert.py
Calls sanitizer.convert_fp64_to_fp32() in convert_to_f16 after sanitization step.
Unit tests for sanitizer
tests/unit/onnx/autocast/test_graphsanitizer.py
New tests verifying FP64→FP32 conversions across initializers, IO, nodes, integrated workflow, and no-op scenario.
Docs: TensorRT ONNX PTQ example
examples/onnx_ptq/README.md
Adds instructions to set CUDNN_LIB_DIR and update LD_LIBRARY_PATH within TensorRT Docker context.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Convert as convert_to_f16
  participant Sanitizer as GraphSanitizer
  participant ONNX as ONNX Model

  User->>Convert: Invoke convert_to_f16(model)
  Convert->>Sanitizer: sanitize(model)
  rect rgba(200,240,255,0.3)
    note over Sanitizer: Existing sanitize steps (custom nodes, opset, LN patterns, names, cleanup)
    Sanitizer->>Sanitizer: convert_fp64_to_fp32() [NEW]
    note right of Sanitizer: Convert initializers, IO types,<br/>Cast/Constant/ConstantOfShape to FP32
    Sanitizer->>ONNX: Infer shapes after conversion
  end
  Sanitizer-->>Convert: Sanitized model
  Convert->>ONNX: Proceed with precision conversion to FP16
  ONNX-->>User: Returns converted model
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Thump-thump! I tweak the graph with care,
Turning heavy 64s to lighter air.
Casts hop, Constants shed their weight—
FP32 is our new gate.
In docs, a path for cuDNN’s lair—
Now onward to F16, swift as a hare! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 93.75% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately highlights the main change in the PR, namely the enhancement of the GraphSanitizer to convert Double (FP64) values to Float32, matching the implemented additions in the code and tests.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ajrasane/onnx_precision_bug

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
modelopt/onnx/autocast/precisionconverter.py (3)

184-189: Keep I/O types but still clear output shape dims to let inference refresh shapes

When keep_io_types is True, you skip resetting output elem_type (good), but you also skip clearing output dims. That can leave stale shapes on outputs and prevent shape inference from updating them. Recommend: only guard elem_type, not the dim reset.

Apply this diff:

-            if not self.keep_io_types:
-                for out in self.model.graph.output:
-                    out.type.tensor_type.elem_type = onnx.TensorProto.UNDEFINED
-                    for idx, d in enumerate(out.type.tensor_type.shape.dim):
-                        if d.dim_value:
-                            out.type.tensor_type.shape.dim[idx].dim_param = "unk"
+            if not self.keep_io_types:
+                for out in self.model.graph.output:
+                    out.type.tensor_type.elem_type = onnx.TensorProto.UNDEFINED
+            # Always clear output dims so shape inference can refresh them
+            for out in self.model.graph.output:
+                for idx, d in enumerate(out.type.tensor_type.shape.dim):
+                    if d.dim_value:
+                        out.type.tensor_type.shape.dim[idx].dim_param = "unk"

217-242: Deduplicate and broaden restoration (cover value_info entries too)

Current logic duplicates input/output loops and only updates value_info_map, not graph.value_info entries that might mirror I/O. DRY the code and also update any matching entries in model.graph.value_info for completeness.

Apply this diff:

-    def _restore_original_io_types(self):
-        """Restore original I/O types."""
-        # Restore input types
-        for input_tensor in self.model.graph.input:
-            if input_tensor.name in self.original_network_io:
-                original_type = self.original_network_io[input_tensor.name]
-                if input_tensor.type.tensor_type.elem_type != original_type:
-                    input_tensor.type.tensor_type.elem_type = original_type
-                    # Update value_info_map if tensor exists there
-                    if input_tensor.name in self.value_info_map:
-                        self.value_info_map[
-                            input_tensor.name
-                        ].type.tensor_type.elem_type = original_type
-
-        # Restore output types
-        for output_tensor in self.model.graph.output:
-            if output_tensor.name in self.original_network_io:
-                original_type = self.original_network_io[output_tensor.name]
-                if output_tensor.type.tensor_type.elem_type != original_type:
-                    output_tensor.type.tensor_type.elem_type = original_type
-                    # Update value_info_map if tensor exists there
-                    if output_tensor.name in self.value_info_map:
-                        self.value_info_map[
-                            output_tensor.name
-                        ].type.tensor_type.elem_type = original_type
+    def _restore_original_io_types(self):
+        """Restore original I/O types."""
+        for vi in list(self.model.graph.input) + list(self.model.graph.output):
+            name = vi.name
+            if name not in self.original_network_io:
+                continue
+            original_type = self.original_network_io[name]
+            if vi.type.tensor_type.elem_type != original_type:
+                vi.type.tensor_type.elem_type = original_type
+                # Sync value_info_map entry if present
+                if name in self.value_info_map:
+                    self.value_info_map[name].type.tensor_type.elem_type = original_type
+                # Also sync a matching entry in graph.value_info if it exists
+                for extra_vi in self.model.graph.value_info:
+                    if extra_vi.name == name:
+                        extra_vi.type.tensor_type.elem_type = original_type
+                        break

455-456: Skip generating no‑op casts at model I/O boundaries

Even though redundant casts are cleaned up later, you can avoid inserting them by checking the current tensor type. This reduces graph churn and log noise.

Apply this diff:

-            # Add cast down for network outputs (only if not keeping I/O types)
-            if not self.keep_io_types:
-                cast_to_fp16.extend([output for output in node.output if output in network_outputs])
+            # Add cast down for network outputs (only if not keeping I/O types), and only if needed
+            if not self.keep_io_types:
+                for output in node.output:
+                    if output in network_outputs and self._get_tensor_type(output) != self.low_precision_type.onnx_type:
+                        cast_to_fp16.append(output)

Optionally do the same for inputs:

-            cast_to_fp32.extend([input for input in node.input if input in network_inputs])
+            for inp in node.input:
+                if inp in network_inputs and self._get_tensor_type(inp) != self.high_precision_type.onnx_type:
+                    cast_to_fp32.append(inp)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26c203a and 1b85355.

📒 Files selected for processing (1)
  • modelopt/onnx/autocast/precisionconverter.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)

204-206: LGTM: Restore original I/O types after cleanup

Restoring I/O types right after cleanup and before the sanity check is the correct spot.

Please confirm we don’t need an extra infer_shapes pass after restoring I/O elem_type for models where outputs were previously UNDEFINED. If checker ever complains on some models, running a final infer_shapes(..., check_type=True) here would be a safe follow‑up.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
modelopt/onnx/autocast/precisionconverter.py (2)

216-241: Also sync graph.value_info to avoid map/graph divergence

If value_info_map doesn’t alias graph.value_info entries, types can drift. Update graph.value_info alongside the map for robustness.

Apply this diff inside the existing blocks where you update value_info_map:

@@
-                    if input_tensor.name in self.value_info_map:
+                    if input_tensor.name in self.value_info_map:
                         self.value_info_map[
                             input_tensor.name
                         ].type.tensor_type.elem_type = original_type
+                        # Also update corresponding entry in graph.value_info if present
+                        for vi in self.model.graph.value_info:
+                            if vi.name == input_tensor.name:
+                                vi.type.tensor_type.elem_type = original_type
+                                break
@@
-                    if output_tensor.name in self.value_info_map:
+                    if output_tensor.name in self.value_info_map:
                         self.value_info_map[
                             output_tensor.name
                         ].type.tensor_type.elem_type = original_type
+                        # Also update corresponding entry in graph.value_info if present
+                        for vi in self.model.graph.value_info:
+                            if vi.name == output_tensor.name:
+                                vi.type.tensor_type.elem_type = original_type
+                                break

451-455: Honor the “only if not keeping I/O types” comment with a guard

You’re still collecting output tensors for cast-down when keep_io_types=True; cleanup will remove them, but we can avoid churn.

-            # Add cast down for network outputs (only if not keeping I/O types)
-            cast_to_fp16.extend([output for output in node.output if output in network_outputs])
+            # Add cast down for network outputs (only if not keeping I/O types)
+            if not self.keep_io_types:
+                cast_to_fp16.extend([output for output in node.output if output in network_outputs])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b85355 and 7ccc35e.

📒 Files selected for processing (1)
  • modelopt/onnx/autocast/precisionconverter.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (1)
modelopt/onnx/autocast/precisionconverter.py (1)

203-205: LGTM: restoring original I/O types post-cleanup is well-placed

Calling restoration after cleanup and before sanity-check keeps graph names stable and fixes I/O dtypes as intended.

Copy link

codecov bot commented Sep 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.53%. Comparing base (8b1cedf) to head (36fc519).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #364      +/-   ##
==========================================
+ Coverage   73.48%   73.53%   +0.05%     
==========================================
  Files         172      172              
  Lines       17636    17700      +64     
==========================================
+ Hits        12960    13016      +56     
- Misses       4676     4684       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajrasane ajrasane force-pushed the ajrasane/onnx_precision_bug branch from 7ccc35e to 1e4d697 Compare September 24, 2025 22:27
@ajrasane ajrasane requested a review from galagam September 25, 2025 17:14
@ajrasane ajrasane enabled auto-merge (squash) September 25, 2025 17:23
Comment on lines 222 to 223
if tensor.type.tensor_type.elem_type != original_type:
tensor.type.tensor_type.elem_type = original_type
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're just updating the tensor's metadata here. Shouldn't we also inject a cast here?

Looking at the bigger picture, keep_io_types flag was covered in the unit tests, and is also set as default in MOQ, so generally we know it is working, but there's some edge case we're first seeing in https://nvbugspro.nvidia.com/bug/5532019.
Inputs are clearly handled in lines 145-148, and outputs are presumably handled in _cleanup_pre_output_same_type_cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this function and created conversion logic in the GraphSanitizer.

@ajrasane ajrasane disabled auto-merge September 25, 2025 17:49
@ajrasane ajrasane requested a review from a team as a code owner September 25, 2025 20:25
@ajrasane ajrasane requested a review from gcunhase September 25, 2025 20:25
@ajrasane ajrasane changed the title [AutoCast] restore outputs if keep_io_types is set [AutoCast] Update GraphSanitizer to convert Double to Float32 Sep 25, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt/onnx/autocast/convert.py (1)

100-103: Don’t mutate default providers list.

providers has a list default and is mutated via insert, causing sticky state across calls.

Apply this diff:

-def convert_to_mixed_precision(
+def convert_to_mixed_precision(
@@
-    providers: list[str] = ["cpu"],
+    providers: list[str] | None = None,
@@
-) -> onnx.ModelProto:
+) -> onnx.ModelProto:
@@
-    # Load and process model
+    # Load and process model
     model = onnx.load(onnx_path, load_external_data=True)
     assert low_precision_type in ["fp16", "bf16"], "low_precision_type must be either fp16 or bf16"
+    providers = (providers or ["cpu"]).copy()
@@
-    if "trt" not in providers and graph_sanitizer.custom_ops:
-        providers.insert(0, "trt")
+    if "trt" not in providers and graph_sanitizer.custom_ops:
+        providers = ["trt"] + providers
modelopt/onnx/autocast/graphsanitizer.py (1)

320-329: Bug: scale_dimension may be referenced before assignment.

If the pattern matches “bias only, no scale,” scale remains None and scale_dimension is never set before the None check, causing an UnboundLocalError.

Minimal fix:

-            if scale is not None:
-                scale_dimension = scale.shape
-
-            # Skip pattern if we can't determine the scale dimension
-            if scale_dimension is None:
+            scale_dimension = None
+            if scale is not None:
+                scale_dimension = scale.shape
+            # Skip pattern if we can't determine the scale dimension
+            if scale_dimension is None:
                 logger.debug(
                     f"Could not determine scale dimension for LayerNorm pattern at {mean_node.name}"
                 )
                 return None

Optional improvement: derive scale_dimension from the input shape and axis when scale is absent.

🧹 Nitpick comments (8)
modelopt/onnx/autocast/convert.py (2)

151-157: Avoid mutable default args (op_block_list, trt_plugins).

Using [] as a default can leak state across calls. Prefer None and initialize inside.

Apply this diff:

-def convert_to_f16(
-    model: onnx.ModelProto,
-    low_precision_type: str = "fp16",
-    keep_io_types: bool = True,
-    op_block_list: list[str] = [],
-    trt_plugins: list[str] | None = [],
-) -> onnx.ModelProto:
+def convert_to_f16(
+    model: onnx.ModelProto,
+    low_precision_type: str = "fp16",
+    keep_io_types: bool = True,
+    op_block_list: list[str] | None = None,
+    trt_plugins: list[str] | None = None,
+) -> onnx.ModelProto:
@@
-    sanitizer = GraphSanitizer(
+    op_block_list = op_block_list or []
+    trt_plugins = trt_plugins or []
+    sanitizer = GraphSanitizer(

185-187: Avoid redundant shape inference.

convert_fp64_to_fp32 already calls infer_shapes(strict_mode=True) when it modifies the graph. Consider gating the second inference to reduce overhead (e.g., have convert_fp64_to_fp32 return a modified flag).

modelopt/onnx/autocast/graphsanitizer.py (4)

70-89: Consider returning a flag to signal modifications.

Returning modified (bool) can let callers skip redundant work (like extra infer_shapes).

Apply this diff:

-    def convert_fp64_to_fp32(self) -> None:
+    def convert_fp64_to_fp32(self) -> bool:
@@
-        if modified:
+        if modified:
             logger.info("Converted FP64 initializers, I/O types, and nodes to FP32")
             self.model = onnx_utils.infer_shapes(self.model, strict_mode=True)
+        return modified

And in sanitize():

-        self.convert_fp64_to_fp32()
+        _ = self.convert_fp64_to_fp32()

454-472: Guard against non-tensor value_info entries.

Accessing tensor.type.tensor_type assumes tensor_type is set; sequence/map types would error. Add a defensive check.

Apply this diff:

-        def convert_tensor_list(tensors, tensor_type):
+        def convert_tensor_list(tensors, tensor_type):
             nonlocal modified
             for tensor in tensors:
-                if tensor.type.tensor_type.elem_type == onnx.TensorProto.DOUBLE:
+                # Only process TensorProto entries
+                tt = getattr(tensor.type, "tensor_type", None)
+                if tt is not None and tt.elem_type == onnx.TensorProto.DOUBLE:
-                    tensor.type.tensor_type.elem_type = onnx.TensorProto.FLOAT
+                    tt.elem_type = onnx.TensorProto.FLOAT
                     modified = True
                     logger.debug(f"Converted {tensor_type} {tensor.name} from FP64 to FP32")

485-506: Broaden FP64 normalization coverage (optional).

Some ops encode dtypes via attributes (e.g., RandomNormal, RandomUniform, EyeLike, CastLike) or use value_float/double forms. Consider handling those for completeness.


31-38: Avoid mutable default arg in init (trt_plugins).

Use None and normalize inside to prevent shared state across instances.

Apply this diff:

-    def __init__(
+    def __init__(
         self,
         model: onnx.ModelProto,
         min_opset: int = 13,
         max_ir_version: int | None = None,
-        trt_plugins: list[str] | None = [],
+        trt_plugins: list[str] | None = None,
     ) -> None:
@@
-        self.trt_plugins = trt_plugins
+        self.trt_plugins = trt_plugins or []
tests/unit/onnx/autocast/test_graphsanitizer.py (2)

328-373: Add tests for keep_io_types on outputs (PR objective).

Please add coverage ensuring outputs are preserved when keep_io_types=True (and cast when False).

Here’s a minimal addition to append at the end of this file:

+def _simple_fp32_model_with_fp16able_body():
+    x = helper.make_tensor_value_info("X", TensorProto.FLOAT, [1, 2])
+    y = helper.make_tensor_value_info("Y", TensorProto.FLOAT, [1, 2])
+    w = numpy_helper.from_array(np.ones((1, 2), dtype=np.float32), name="W")
+    add = helper.make_node("Add", ["X", "W"], ["Y"])
+    graph = helper.make_graph([add], "mp_test", [x], [y], initializer=[w])
+    model = helper.make_model(graph)
+    return model
+
+def test_keep_io_types_preserves_outputs_fp32_in_convert_to_f16():
+    from modelopt.onnx.autocast.convert import convert_to_f16
+    model = _simple_fp32_model_with_fp16able_body()
+    mod = convert_to_f16(model, low_precision_type="fp16", keep_io_types=True)
+    # Output should remain FLOAT
+    assert mod.graph.output[0].type.tensor_type.elem_type == TensorProto.FLOAT
+
+def test_keep_io_types_false_allows_output_downcast_in_convert_to_f16():
+    from modelopt.onnx.autocast.convert import convert_to_f16
+    model = _simple_fp32_model_with_fp16able_body()
+    mod = convert_to_f16(model, low_precision_type="fp16", keep_io_types=False)
+    # Output is allowed to be cast down; exact check depends on implementation.
+    # If outputs are explicitly set, expect FLOAT16; otherwise cast nodes at boundary.
+    # At minimum, ensure graph contains a Cast to FLOAT16 feeding Y or output dtype is FLOAT16.
+    output_dtype = mod.graph.output[0].type.tensor_type.elem_type
+    has_cast_to_fp16 = any(
+        n.op_type == "Cast" and any(a.name == "to" and a.i == TensorProto.FLOAT16 for a in n.attribute)
+        for n in mod.graph.node
+    )
+    assert output_dtype == TensorProto.FLOAT16 or has_cast_to_fp16

91-111: Cover “bias-only” LayerNorm case.

Add a test with add_scale=False, add_bias=True to catch the scale_dimension bug and validate the fix.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01822ec and 36fc519.

📒 Files selected for processing (4)
  • examples/onnx_ptq/README.md (1 hunks)
  • modelopt/onnx/autocast/convert.py (1 hunks)
  • modelopt/onnx/autocast/graphsanitizer.py (2 hunks)
  • tests/unit/onnx/autocast/test_graphsanitizer.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/onnx_ptq/README.md
🧰 Additional context used
🧬 Code graph analysis (2)
modelopt/onnx/autocast/convert.py (1)
modelopt/onnx/autocast/graphsanitizer.py (1)
  • convert_fp64_to_fp32 (70-88)
tests/unit/onnx/autocast/test_graphsanitizer.py (1)
modelopt/onnx/autocast/graphsanitizer.py (4)
  • _convert_fp64_initializers (429-451)
  • _convert_fp64_io_types (453-473)
  • _convert_fp64_nodes (475-506)
  • convert_fp64_to_fp32 (70-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (4)
modelopt/onnx/autocast/convert.py (2)

172-184: Good call: normalize FP64 early in convert_to_f16.

Adding sanitizer.convert_fp64_to_fp32() here aligns convert_to_f16 with sanitize() flow and prevents downstream dtype surprises.


117-126: keep_io_types semantics validated end-to-end
PrecisionConverter.convert skips I/O casting when keep_io_types=True and restores original output dtypes when False; unit tests in test_precisionconverter.py (I/O dtype assertions, lines 93–98 & 151–156) and test_autocast.py (lines 144–147) already cover both inputs and outputs.

modelopt/onnx/autocast/graphsanitizer.py (1)

68-69: LGTM: integrate FP64→FP32 into sanitize().

This centralizes normalization and keeps downstream passes consistent.

tests/unit/onnx/autocast/test_graphsanitizer.py (1)

188-410: Solid coverage for FP64→FP32 conversions.

Tests exercise initializers, IO, nodes, and integration paths well.

Comment on lines +29 to +34
Set the following environment variables inside the TensorRT docker.

```bash
export CUDNN_LIB_DIR=/usr/lib/x86_64-linux-gnu/
export LD_LIBRARY_PATH="${CUDNN_LIB_DIR}:${LD_LIBRARY_PATH}"
```
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we do in the CICD: https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/_installation_for_Linux.html#environment-setup

If that is sufficient, we can also just link this doc here. Else we can update that file too

Copy link
Contributor Author

@ajrasane ajrasane Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps mentioned above are a bit different for the TRT container.
There is no /usr/local/tensorrt/ in the TRT container
trtexec is stored under /opt/tensorrt/bin, which is already in PATH
And adding /usr/lib/x86_64-linux-gnu is sufficient for getting libcudnn.so, which is required for the TRTExecutionProvider.
I would suggest we keep this as it is as it will be more convenient for the user to run these steps as part of the setup right after they runs their container.
The steps you shared above should also work fine for the CI, but they are not required for the TRT container.

@gcunhase gcunhase changed the title [AutoCast] Update GraphSanitizer to convert Double to Float32 [5532019][AutoCast] Update GraphSanitizer to convert Double to Float32 Sep 25, 2025
Copy link
Contributor

@gcunhase gcunhase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested with 2 CLIs:

$ python -m modelopt.onnx.quantization --onnx_path=far3d_opset17.onnx  --calibration_eps trt cuda:0 cpu
$ python -m modelopt.onnx.quantization --onnx_path=far3d_opset17.onnx  --calibration_eps trt cuda:0 cpu --direct_io_types

@ajrasane ajrasane merged commit a041bbe into main Sep 25, 2025
27 checks passed
@ajrasane ajrasane deleted the ajrasane/onnx_precision_bug branch September 25, 2025 22:30
yeyu-nvidia pushed a commit that referenced this pull request Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants