[5452146] Fix: 'Invalid tensor data type 0' #308

gcunhase · 2025-09-09T14:04:01Z

What does this PR do?

Type of change: Bug fix

Overview: Fixed 'Invalid tensor data type 0' in ONNX Autocast by ensuring that intermediate tensors have a defined data type.

Usage

$ python -m modelopt.onnx.quantization --onnx_path=$MODEL_NAME.onnx --high_precision_dtype=fp16

Testing

Internal model.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: N/A
Did you add or update any necessary documentation?: N/A
Did you update Changelog?: No

Additional Information

Related: #302

Summary by CodeRabbit

Bug Fixes
- Preserve known static dimensions during conversion to avoid unintended shape changes.
- Fill undefined tensor element types after initial inference to prevent downstream errors.
Improvements
- Align shape handling between intermediate values and outputs for consistent behavior.
- Improve robustness of shape/type inference while keeping public interfaces unchanged.

coderabbitai · 2025-09-09T14:04:11Z

Walkthrough

Internal updates to PrecisionConverter.convert() refine shape sanitization to preserve known static dims, add a post-inference step to replace UNDEFINED tensor element types with the target low-precision type, and align handling of graph value_info and outputs. A new private helper _ensure_types_are_defined() is added. Public API unchanged.

Changes

Cohort / File(s)	Summary
ONNX autocast precision handling `modelopt/onnx/autocast/precisionconverter.py`	Adjust conditional `dim_param` updates to avoid overwriting known `dim_value`; call `_ensure_types_are_defined()` after an initial inference pass to replace `UNDEFINED` `elem_type` with the target low-precision ONNX type; apply the conservative sanitization to both `value_info` and `outputs`; add private helper `_ensure_types_are_defined()`; no public API changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant PC as PrecisionConverter
  participant SI as ShapeInference
  participant GI as Graph(value_info/outputs)

  U->>PC: convert(model, target_dtype)
  rect rgb(245,248,255)
    note right of PC: Initial inference without strict type checks
    PC->>SI: infer_shapes(check_type=false)
    SI-->>PC: shapes/types (may include UNDEFINED)
  end

  rect rgb(245,255,245)
    note right of PC: New step: ensure types defined
    PC->>GI: _ensure_types_are_defined()<br/>Replace UNDEFINED elem_type -> target low-precision
    GI-->>PC: value_info updated
  end

  rect rgb(255,248,240)
    note right of PC: Conservative shape sanitization
    PC->>GI: For each dim: set dim_param="unk" only when dim_value is concrete
    GI-->>PC: value_info & outputs aligned
  end

  PC-->>U: Converted model

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "[5452146] Fix: 'Invalid tensor data type 0'" succinctly captures the primary change, clearly referencing the specific error being addressed and aligning with the pull request’s focus on defining intermediate tensor data types to resolve it.
Description Check	✅ Passed	The description directly addresses the bug fix by explaining that the PR ensures intermediate tensors have defined data types to resolve the “Invalid tensor data type 0” error and includes relevant usage and testing information consistent with the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Poem

I twitch my whiskers at dims that hide,
I hop through graphs where types collide.
I set the unknowns to tidy the trail,
Fill UNDEFINEDs so checks won't fail.
A tiny rabbit, precise and spry—🥕

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0001ee and 5bd42c5.

📒 Files selected for processing (1)

modelopt/onnx/autocast/precisionconverter.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt/onnx/autocast/precisionconverter.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: gcunhase <[email protected]>

codecov · 2025-09-09T14:25:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.86%. Comparing base (a6fa34c) to head (5bd42c5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #308      +/-   ##
==========================================
- Coverage   73.93%   73.86%   -0.07%     
==========================================
  Files         172      172              
  Lines       17408    17415       +7     
==========================================
- Hits        12870    12864       -6     
- Misses       4538     4551      +13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: gcunhase <[email protected]>

gcunhase requested a review from a team as a code owner September 9, 2025 14:04

gcunhase requested a review from i-riyad September 9, 2025 14:04

gcunhase force-pushed the dev/gcunhasergio/fix_invalid_type_0_5452146 branch from 87ca671 to d0001ee Compare September 9, 2025 14:06

Ensure types are defined

5bd42c5

Signed-off-by: gcunhase <[email protected]>

gcunhase force-pushed the dev/gcunhasergio/fix_invalid_type_0_5452146 branch from d0001ee to 5bd42c5 Compare September 9, 2025 14:12

gcunhase self-assigned this Sep 9, 2025

gcunhase requested a review from galagam September 9, 2025 14:15

i-riyad approved these changes Sep 9, 2025

View reviewed changes

gcunhase merged commit d6d2e75 into NVIDIA:main Sep 9, 2025
22 checks passed

benchislett pushed a commit that referenced this pull request Sep 15, 2025

[5452146] Fix: 'Invalid tensor data type 0' (#308)

65ee28c

Signed-off-by: gcunhase <[email protected]>

gcunhase mentioned this pull request Sep 19, 2025

[Autocast] Fix edge case with cast producing network output #302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[5452146] Fix: 'Invalid tensor data type 0' #308

[5452146] Fix: 'Invalid tensor data type 0' #308

Uh oh!

gcunhase commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[5452146] Fix: 'Invalid tensor data type 0' #308

[5452146] Fix: 'Invalid tensor data type 0' #308

Uh oh!

Conversation

gcunhase commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks (3 passed)

Poem

Uh oh!

codecov bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

gcunhase commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

codecov bot commented Sep 9, 2025 •

edited

Loading