[AWQ] Add option to consider smooth layer quantization in scale search by Ramshankar07 · Pull Request #2323 · vllm-project/llm-compressor

Ramshankar07 · 2026-01-31T18:24:52Z

SUMMARY

This PR adds an option to take smooth layer quantization into account when computing AWQ scales, e.g. for the up_proj → down_proj mapping in transformer FFN blocks.

This is PR for : #2296

Problem

AWQ picks scale factors by minimizing quantization error only for the balance layer (e.g. down_proj) and applies an inverse rescale to the smooth layer (e.g. up_proj), which is usually assumed unquantized. When both are quantization targets, the scale chosen for the balance layer can worsen quantization of the smooth layer because the smooth layer’s quantization error is not included in the objective.

Solution

smooth_layer_quantization flag (default: False) on AWQModifier. When enabled and the smooth layer is in the quantization target list:
Ancestor used for forward / error:** The module used for running samples (and thus for the quantization error) is chosen via get_lowest_common_ancestor_with_avoid over both balance layer names and the smooth layer name.
Scale search: _compute_best_scale is called with smooth_layer_targeted=True. The smooth layer is added to orig_layer_weights, and during the grid search we rescale it by 1/s, quantize to Q(W/s).
Applying scales: After the best scale is chosen, the inverse rescale is applied to the smooth layer (and balance layers) in the existing _smooth logic. When the smooth layer is in orig_layer_weights, we apply rescaling from orig_layer_weights[smooth_layer] so the stored weights are W/s for later calibration, avoiding double rescaling of the quantized grid-search weights.

github-actions · 2026-01-31T18:24:59Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-01-31T18:25:10Z

Summary of Changes

Hello @Ramshankar07, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the AWQ (Activation-aware Weight Quantization) algorithm by introducing an option to account for the quantization of "smooth layers" alongside "balance layers." Previously, AWQ primarily optimized scales for balance layers, assuming smooth layers were unquantized. This change addresses scenarios where both types of layers are quantized, preventing potential degradation of the smooth layer's quantization quality by including its error in the scale search objective. The modification ensures that the scale factors are chosen to minimize quantization error across all targeted layers, leading to more robust and accurate quantized models.

Highlights

New Configuration Option: Introduced a new smooth_layer_quantization flag in AWQModifier to enable considering the smooth layer during AWQ scale computation, addressing scenarios where both smooth and balance layers are targeted for quantization.
Enhanced Ancestor Search: Updated the common ancestor search logic (get_lowest_common_ancestor_with_avoid) to include the smooth layer when smooth_layer_quantization is enabled and the smooth layer is a quantization target, ensuring a more relevant common parent for forward passes.
Improved Scale Computation: Modified the scale computation (_compute_best_scale) to incorporate the smooth layer's quantization error into the objective function during grid search, leading to more optimal scales when the smooth layer is also quantized.
Correct Weight Handling: Ensured correct weight handling for the smooth layer during the final smoothing application (_smooth), preventing double rescaling by using original weights when smooth_layer_quantization is active.
Code Refactoring: Refactored the balance layer quantization logic within the grid search into a dedicated helper method, _apply_balance_layer_quantization_in_grid_search, for improved code clarity and maintainability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an option to account for smooth layer quantization during the AWQ scale search, which is a valuable addition for scenarios where both smooth and balance layers are quantized. The implementation is well-structured, particularly with the extraction of logic into the _apply_balance_layer_quantization_in_grid_search helper function. I've identified a few areas for improvement: a performance optimization in _apply_smoothing, a simplification of conditional logic in _compute_best_scale, and a potential bug fix related to in-place tensor modification during the grid search. Overall, this is a solid contribution.

src/llmcompressor/modifiers/awq/base.py

mergify · 2026-02-02T17:01:35Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

src/llmcompressor/modifiers/awq/base.py

HDCharles

thanks for your contribution! overall looks good but see comments.

otherwise we should run some of the eval tests and see the results, let me know if you need help with that

Ramshankar07 · 2026-02-03T18:28:46Z

I'll do the lm_eval and update by tomorrow

mergify · 2026-02-04T16:50:12Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

src/llmcompressor/modifiers/awq/base.py

HDCharles · 2026-02-25T15:16:10Z

i'll take a look and see, we're trying to get out a release right now though so may be a day or two. Also can you reach out to me on vllm slack under the same username?

Ramshankar07 · 2026-02-25T23:54:42Z

Sure.

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

mergify · 2026-03-02T19:25:10Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

Summary Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>

Ramshankar07 marked this pull request as draft January 31, 2026 18:27

gemini-code-assist bot reviewed Jan 31, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

Ramshankar07 force-pushed the awq-smooth-layer-quantization branch 2 times, most recently from 5a6ec4a to e921cb8 Compare February 1, 2026 01:34

HDCharles requested review from HDCharles, brian-dellabetta and kylesayrs February 2, 2026 17:00

mergify bot added the quality-failed label Feb 2, 2026

HDCharles reviewed Feb 2, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles reviewed Feb 2, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles reviewed Feb 2, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles reviewed Feb 2, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles reviewed Feb 2, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles reviewed Feb 2, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles requested changes Feb 2, 2026

View reviewed changes

Ramshankar07 force-pushed the awq-smooth-layer-quantization branch from e921cb8 to 0056712 Compare February 3, 2026 18:25

mergify bot removed the quality-failed label Feb 3, 2026

Ramshankar07 force-pushed the awq-smooth-layer-quantization branch from 0056712 to 5d2db17 Compare February 3, 2026 18:27

mergify bot added the quality-failed label Feb 4, 2026

Ramshankar07 force-pushed the awq-smooth-layer-quantization branch from f86d512 to 0a09182 Compare February 4, 2026 18:08

mergify bot removed the quality-failed label Feb 4, 2026

Ramshankar07 force-pushed the awq-smooth-layer-quantization branch from 0a09182 to 4ebb35d Compare February 4, 2026 18:10

HDCharles reviewed Feb 5, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles reviewed Feb 5, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles reviewed Feb 5, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/base.py Outdated Show resolved Hide resolved

HDCharles self-requested a review February 25, 2026 15:14

[AWQ] Add option to consider smooth layer quantization in scale search

131f8e5

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

Ramshankar07 force-pushed the awq-smooth-layer-quantization branch from 8265757 to 9addbbe Compare February 28, 2026 02:43

mergify bot added the documentation Improvements or additions to documentation label Feb 28, 2026

dsikka added ready When a PR is ready for review awq For any issue / PR related to AWQ support labels Mar 2, 2026

mergify bot added the quality-failed label Mar 2, 2026

Ramshankar07 and others added 10 commits March 2, 2026 19:52

fix: removing redundancy and fix of copy

9d93671

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

chore: Resolving review

0b365d3

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

fix: code format

6add23e

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

chore: unified rescale function

e9fb72b

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

chore: removing redundant code

53182e7

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

chore: awq base.py updates (smooth layer)

24fa981

Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

refactor: enhance smooth layer handling and unify rescaling logic

81ba531

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

fix: dtype error

8f17581

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

chore: testing experiments and configs

be94309

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

fix: code style and quality

546b4dc

Signed-off-by: Ramshankar07 <picographer0214@gmail.com>

Ramshankar07 force-pushed the awq-smooth-layer-quantization branch from f5b9f29 to 546b4dc Compare March 3, 2026 00:53

Merge branch 'main' into awq-smooth-layer-quantization

1c54e91

mergify bot removed the quality-failed label Mar 3, 2026

HDCharles mentioned this pull request Mar 3, 2026

AWQ smooth layer quantization (v2) [not for land] #2431

Draft

Merge branch 'main' into awq-smooth-layer-quantization

bf7fcf1

HDCharles assigned HDCharles and unassigned HDCharles Mar 9, 2026

HDCharles and others added 3 commits March 10, 2026 02:01

refactor

5e147a7

Summary Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>

refactor AWQ stuff

d7d107d

Summary Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>

Merge branch 'vllm-project:main' into awq-smooth-layer-quantization

a1c5b96

Conversation

Ramshankar07 commented Jan 31, 2026

Uh oh!

github-actions bot commented Jan 31, 2026

Uh oh!

gemini-code-assist bot commented Jan 31, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Feb 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HDCharles left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ramshankar07 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Feb 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HDCharles commented Feb 25, 2026

Uh oh!

Ramshankar07 commented Feb 25, 2026

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HDCharles left a comment •

edited

Loading

Ramshankar07 commented Feb 3, 2026 •

edited

Loading