[Autocast] Fix edge case with cast producing network output #302

aboubezari · 2025-09-08T23:17:51Z

What does this PR do?

Type of change: Bug Fix

Overview:
If there is a cast producing a network output as well as some other nodes connected in the graph, then autocast will create a disconnected graph.

Autocast will update the node producing the cast to instead produce the network output. However, it will not update the nodes that consume this node, leaving a disconnected graph. To fix, update the consumers with the new, updated output

Usage

Autocast precision converter

Testing

Added a unittest that fails before my change, and passes after my fix.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: No

Summary by CodeRabbit

Bug Fixes
- Corrected handling of models whose outputs are produced by cast operations, ensuring downstream connections are rewired properly and output dtypes follow the keep I/O types setting.
- Prevents graph inconsistencies and unexpected dtype changes in converted models.
Tests
- Added parameterized tests covering models with cast-produced outputs across multiple low-precision types and I/O type retention settings to verify correct behavior.

Signed-off-by: Ali Boubezari <[email protected]>

copy-pr-bot · 2025-09-08T23:17:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-08T23:17:58Z

Walkthrough

Extends PrecisionConverter._bypass_cast_node to also rewire consumers of the pre-cast input when the removed Cast produced a model output. Adds unit tests covering models whose outputs are Cast nodes, parameterized over low-precision type and keep-io-types. Test file includes duplicated fixture and test blocks.

Changes

Cohort / File(s)	Summary
PrecisionConverter bypass rewiring `modelopt/onnx/autocast/precisionconverter.py`	In the is_output_producer branch of `_bypass_cast_node`, after rerouting producers, now also replaces occurrences of `input_tensor` with `output_tensor` across all consumers of `input_tensor`. Non-output-producer logic unchanged.
ONNX autocast tests `tests/unit/onnx/autocast/test_precisionconverter.py`	Adds fixture `model_with_casted_output` and test `test_casted_output_model`, parameterized by `low_precision_type` and `keep_io_types`, verifying output dtypes when outputs are produced by Cast nodes. Note: fixture and test are duplicated in the file.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant PC as PrecisionConverter
  participant G as Graph
  participant P as Producer(s)
  participant C as Cast (to be removed)
  participant DC as Downstream Consumer(s)
  participant IC as InputTensor Consumer(s)

  Note over PC,G: _bypass_cast_node (is_output_producer == true)

  PC->>G: Identify C, its input_tensor and output_tensor
  PC->>G: Rewire P to feed around C (bypass)
  PC->>G: For DC consuming C.output_tensor<br/>update inputs to bypassed tensor
  rect rgba(230,245,255,0.6)
  Note over PC,G: New step in this branch
  PC->>G: Find IC consuming input_tensor
  PC->>G: Replace input_tensor with output_tensor in IC inputs
  end
  PC->>G: Remove C
  G-->>PC: Graph updated

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A hop, a skip, a wire re-spun,
Casts step aside—connections run.
Outputs now glide on tidy trails,
While tests in pairs set sturdy rails.
Nose twitch, ears perk—job neatly done! 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
		  - name: "Undocumented Breaking Changes"
			  mode: "warning"
			  instructions: |
				  Flag potential breaking changes that are not documented:
				  1. Identify changes to public APIs/exports, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints (including removed/renamed items and changes to types, required params, return values, defaults, or behavior).
				  2. Ignore purely internal/private changes (e.g., code not exported from package entry points or marked internal).
				  3. Verify documentation exists: a "Breaking Change" section in the PR description and updates to CHANGELOG.md.

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly summarizes the main fix in the Autocast component by stating the edge case involving a cast producing a network output, and it is concise and specific without extraneous detail.
Description Check	✅ Passed	The pull request description directly addresses the edge case with Cast nodes producing network outputs and explains how the fix updates downstream consumer nodes to prevent disconnected graphs, and it notes the added unit test that validates the change, which aligns with the code modifications.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

modelopt/onnx/autocast/precisionconverter.py (1)
579-583: Avoid rewiring the Cast node itself (small safety nit).

When rewriting consumers of the pre-cast tensor, skip the Cast node we're about to remove to avoid a transient self-loop (input==output). It’s harmless since the node is deleted, but skipping reduces churn.
-                for consumer in consumers:
+                for consumer in consumers:
+                    if consumer is node:
+                        # Skip the Cast being removed; it will be deleted shortly.
+                        continue
                     for i, input_name in enumerate(consumer.input):
                         if input_name == input_tensor:
                             consumer.input[i] = output_tensor
tests/unit/onnx/autocast/test_precisionconverter.py (1)
1071-1071: Remove unnecessary filesystem write in fixture.

Writing the model to /tmp during tests is unnecessary and can break in sandboxed CI. Please drop it.
-    onnx.save(model, "/tmp/model_with_casted_output.onnx")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf6f1d4 and b95458b.

📒 Files selected for processing (2)

modelopt/onnx/autocast/precisionconverter.py (1 hunks)
tests/unit/onnx/autocast/test_precisionconverter.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/unit/onnx/autocast/test_precisionconverter.py (2)

modelopt/onnx/utils.py (1)

check_model (557-569)

modelopt/onnx/autocast/precisionconverter.py (2)

PrecisionConverter (55-1037)

convert (113-202)

🔇 Additional comments (1)

tests/unit/onnx/autocast/test_precisionconverter.py (1)

1028-1106: No duplicate definitions found for model_with_casted_output or test_casted_output_model. Verified exactly one fixture and one test definition in this file.

aboubezari · 2025-09-08T23:25:55Z

@coderabbitai generate docstrings

coderabbitai · 2025-09-08T23:26:02Z

✅ Actions performed

Docstrings generation started.

@aboubezari

Docstrings generation was requested by @aboubezari. * #302 (comment) The following files were modified: * `modelopt/onnx/autocast/precisionconverter.py` * `tests/unit/onnx/autocast/test_precisionconverter.py`

coderabbitai · 2025-09-08T23:27:12Z

Note

Generated docstrings for this pull request at #303

gcunhase · 2025-09-19T18:06:00Z

Fixed by #308, closing.

[Autocast] Fix edge case with cast producing network output

b95458b

Signed-off-by: Ali Boubezari <[email protected]>

aboubezari requested a review from a team as a code owner September 8, 2025 23:17

aboubezari requested a review from i-riyad September 8, 2025 23:17

coderabbitai bot reviewed Sep 8, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Sep 8, 2025

📝 Add docstrings to aboubezari/fix_autocast_cast_output_producer_bug #303

Open

gcunhase mentioned this pull request Sep 9, 2025

[5452146] Fix: 'Invalid tensor data type 0' #308

Merged

gcunhase closed this Sep 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Autocast] Fix edge case with cast producing network output #302

[Autocast] Fix edge case with cast producing network output #302

Uh oh!

aboubezari commented Sep 8, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 8, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

aboubezari commented Sep 8, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025

Uh oh!

gcunhase commented Sep 19, 2025

Uh oh!

Uh oh!

[Autocast] Fix edge case with cast producing network output #302

[Autocast] Fix edge case with cast producing network output #302

Uh oh!

Conversation

aboubezari commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 8, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks (2 passed, 1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

aboubezari commented Sep 8, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025

Uh oh!

coderabbitai bot commented Sep 8, 2025

Uh oh!

gcunhase commented Sep 19, 2025

Uh oh!

Uh oh!

aboubezari commented Sep 8, 2025 •

edited

Loading

coderabbitai bot commented Sep 8, 2025 •

edited

Loading