Skip to content

Conversation

ajrasane
Copy link
Contributor

@ajrasane ajrasane commented Sep 30, 2025

What does this PR do?

Type of change: documentation

Overview:

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Did you add or update any necessary documentation?: Yes

Summary by CodeRabbit

  • Documentation
    • Clarified TensorRT limitations in the ONNX PTQ example README.
    • Noted FP8 convolution restrictions (kernel sizes, channel multiples) and lack of NVFP4 convolution kernels.
    • Stated NVFP4 export is effectively limited to GEMM-heavy Transformer-style models (e.g., ViT); convolution-heavy CNNs may fail with nvfp4/int4_awq.
    • Documentation-only guidance; no behavioral changes.

@ajrasane ajrasane requested a review from cjluo-nv September 30, 2025 22:22
@ajrasane ajrasane self-assigned this Sep 30, 2025
@ajrasane ajrasane requested a review from a team as a code owner September 30, 2025 22:22
@ajrasane ajrasane requested a review from gcunhase September 30, 2025 22:22
Copy link

coderabbitai bot commented Sep 30, 2025

Walkthrough

Added documentation notes to examples/onnx_ptq/README.md clarifying TensorRT FP8 convolution limitations (kernel sizes and channel multiples) and that ONNX PTQ examples are intended for Transformer-style models (e.g., ViT), not convolution-heavy CNNs. No code or control-flow changes.

Changes

Cohort / File(s) Summary
Docs: ONNX PTQ README notes
examples/onnx_ptq/README.md
Added two informational note blocks describing TensorRT FP8 convolution restrictions (allowed kernel sizes and channel multiples, absence of NVFP4 conv kernels) and clarifying that NVFP4 export is effectively limited to GEMM-heavy Transformer-style models (e.g., ViT); placed after the Torch quantize-to-ONNX example and after the MXFP8/INT4/NVFP4 section.

Sequence Diagram(s)

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

A rabbit nudges lines of text,
“ViTs hop well, CNNs perplexed.”
FP8 kernels mind their size,
Channels counted, nothing lies.
Notes now snug in README's light—🥕

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately describes the nature of the change by indicating an update to the README for the torch_quant_to_onnx.py example and is concise and directly related to the modified documentation without extraneous detail.
Linked Issues Check ✅ Passed The pull request fulfills the requirement of issue #362 by adding a note in the example’s README that details the current limitations of NVFP4 quantization on convolutional CNN models, explicitly explaining that only GEMM-heavy Transformer models are supported and that common vision CNNs will fail under quantize_mode=nvfp4.
Out of Scope Changes Check ✅ Passed All modifications are confined to the README for the torch_quant_to_onnx.py example and directly pertain to documenting NVFP4 limitations, with no unrelated code or files altered.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ajrasane/readme_update

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13c4939 and 608e636.

📒 Files selected for processing (1)
  • examples/onnx_ptq/README.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/onnx_ptq/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: code-quality
  • GitHub Check: build-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17439e6 and 13c4939.

📒 Files selected for processing (1)
  • examples/onnx_ptq/README.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: build-docs
  • GitHub Check: code-quality

Copy link

codecov bot commented Sep 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.79%. Comparing base (17439e6) to head (608e636).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #395   +/-   ##
=======================================
  Coverage   73.79%   73.79%           
=======================================
  Files         171      171           
  Lines       17583    17583           
=======================================
  Hits        12975    12975           
  Misses       4608     4608           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajrasane ajrasane force-pushed the ajrasane/readme_update branch from 13c4939 to 608e636 Compare October 1, 2025 07:20
--onnx_save_path=<path to save the exported ONNX model>
```

> *Note: TensorRT has limited support for Convolution layers with certain precision formats. FP8 Convolution layers remain restricted to specific kernel sizes and channel multiples, and there are no NVFP4 convolution kernels today—NVFP4 export is effectively limited to GEMM-heavy Transformer-style models (e.g., ViT). Convolution-centric CNNs such as ResNet, ConvNeXt, or MobileNet will fail when exported with `quantize_mode=nvfp4|int4_awq`.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will they fail to export or just export in FP16?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ONNX PTQ with NVFP4 fails on Vision CNN models
2 participants