- 
                Notifications
    
You must be signed in to change notification settings  - Fork 190
 
[OMNIML-2932] Fusing pre_quant_scale for NVFP4 AWQ #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 
           Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here.  | 
    
| 
          
 Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the  You can disable this status message by setting the  ✨ Finishing touches🧪 Generate unit tests (beta)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment   | 
    
          Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@            Coverage Diff             @@
##             main     #421      +/-   ##
==========================================
+ Coverage   73.38%   73.44%   +0.06%     
==========================================
  Files         180      180              
  Lines       17934    18147     +213     
==========================================
+ Hits        13160    13328     +168     
- Misses       4774     4819      +45     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
  | 
    
6da3636    to
    cd036ed      
    Compare
  
    Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
cd036ed    to
    c5d9682      
    Compare
  
    Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
d9dfc39    to
    a5a6e39      
    Compare
  
    Signed-off-by: weimingc <[email protected]>
ae2a32c    to
    6020e94      
    Compare
  
    Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
What does this PR do?
Type of change: ?
Overview:
This PR and NVIDIA/TensorRT-LLM#8698 enable NVFP4 AWQ deployment for TRT-LLM. Specifically, this PR fuses pre_quant_scale in following two cases:
Usage
# Add a code snippet demonstrating how to use thisTesting
unit test, e2e test for Qwen3 dense and moe models.
Before your PR is "Ready for review"
Additional Information