-
Notifications
You must be signed in to change notification settings - Fork 169
Update ReadMe for torch_quant_to_onnx.py example #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdded documentation notes to examples/onnx_ptq/README.md clarifying TensorRT FP8 convolution limitations (kernel sizes and channel multiples) and that ONNX PTQ examples are intended for Transformer-style models (e.g., ViT), not convolution-heavy CNNs. No code or control-flow changes. Changes
Sequence Diagram(s)Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/onnx_ptq/README.md
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: build-docs
- GitHub Check: code-quality
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #395 +/- ##
=======================================
Coverage 73.79% 73.79%
=======================================
Files 171 171
Lines 17583 17583
=======================================
Hits 12975 12975
Misses 4608 4608 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: ajrasane <[email protected]>
13c4939
to
608e636
Compare
--onnx_save_path=<path to save the exported ONNX model> | ||
``` | ||
|
||
> *Note: TensorRT has limited support for Convolution layers with certain precision formats. FP8 Convolution layers remain restricted to specific kernel sizes and channel multiples, and there are no NVFP4 convolution kernels today—NVFP4 export is effectively limited to GEMM-heavy Transformer-style models (e.g., ViT). Convolution-centric CNNs such as ResNet, ConvNeXt, or MobileNet will fail when exported with `quantize_mode=nvfp4|int4_awq`.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will they fail to export or just export in FP16?
What does this PR do?
Type of change: documentation
Overview:
Before your PR is "Ready for review"
Summary by CodeRabbit