fix: add norm calibration context for unit-offset RMSNorm (Gemma/Qwen3Next)#2500
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical issue in the AWQ quantization process that specifically affected Gemma models. It introduces a specialized handling for Gemma's unique RMSNorm implementation, which computes Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
There was a problem hiding this comment.
Code Review
This pull request introduces a well-implemented fix for handling unit-offset RMSNorm in Gemma models during AWQ smoothing. The logic to detect and apply the correct scaling transformation for Gemma-family norms is sound and cleanly encapsulated. The addition of unit tests is also a valuable contribution. I've included one suggestion to enhance the test coverage for the new functionality.
brian-dellabetta
left a comment
There was a problem hiding this comment.
Hi @Yatimai , thanks for taking an initial stab at this, this has been on the docket but haven't had a chance to open a PR yet. I've left some comments, main concern is that this fix is specific to AWQ whereas this custom norm forward implementation would break other modifiers as well. A general solution would be preferred
|
Update: reworked the fix based on @brian-dellabetta's review, instead of patching AWQ only, it's now a |
a72180e to
c25f428
Compare
brian-dellabetta
left a comment
There was a problem hiding this comment.
Hi @Yatimai , thanks for this, things are looking really good. I have one nit, if we can patch that and I'll run it on my side before approving and adding reviewers
c25f428 to
ebde718
Compare
brian-dellabetta
left a comment
There was a problem hiding this comment.
Hi @Yatimai , sorry a few more nits after 2nd pass. Will try this out now
…3Next) Signed-off-by: Gilles Turpin <turpingilles15@gmail.com>
ebde718 to
c9eb09d
Compare
There was a problem hiding this comment.
Thanks @Yatimai for the contribution! This looks great to me. We may ultimately want to merge the moe and norm calibration contexts, as they share the same interface, but we can leave that off for now. Approving and will ping team for a second reviewer.
I was able to run AWQ on google/gemma-2-2b-it on this branch and get reasonable output, whereas on main it's gibberish
|
Thanks @brian-dellabetta for your help! |
|
hey, google/medgemma-27b-text-it
I used W4A16 for both GPTQ and AWQ AWQ result's in fix/awq-gemma-rmsnorm in this branch
GPTQ results in main branch
@Yatimai Please let me know if you want me to do more testing, but its producing coherent output @fix/awq-gemma-rmsnorm. |
|
Thanks @Jeevi10 for testing, appreciate the validation! |
SUMMARY
Some architectures (Gemma, Gemma2, Gemma3, Qwen3Next) use an offset normalization where the forward computes
output * (1 + weight)instead ofoutput * weight. This breaks any modifier that smooths norm weights (AWQ, SmoothQuant, SpinQuant, QuIP) because dividing a(1+weight)parameter by scales produces1 + weight/scalesinstead of(1 + weight)/scales.Following @brian-dellabetta's suggestion, this adds a
norm_calibration_contextthat temporarily replaces offset-norm modules with standard-norm equivalents during calibration, following the same pattern asmoe_calibration_context. On entry, offset norms are replaced withCalibrationOffsetNormmodules (weight = 1 + original). On exit, modules are restored with updated weights (weight = smoothed - 1).Only norms operating on
hidden_sizeare converted. Norms operating onhead_dim(e.g.q_norm/k_normin Gemma3 attention) are skipped since no modifier smooths them.TEST PLAN
Unit tests (8/8 passing):
hidden_sizefilter:q_norm/k_normcorrectly skippedE2E validation:
google/gemma-2-2b-itgoogle/medgemma-27b-text-itQwen3-Next architecture verified structurally:
hidden_size=2048,head_dim=256,Qwen3NextRMSNormuses same(1+weight)pattern. No smaller Qwen3-Next model exists for e2e testing (80B MoE only).Fixes #2365
Fixes #2102
Related to #2202
Related to #2059