[WIP] AWQ refactor by kylesayrs · Pull Request #2428 · vllm-project/llm-compressor

kylesayrs · 2026-03-02T19:27:04Z

SUMMARY:
"please provide a brief summary"

TEST PLAN:
"please outline how the changes were tested"

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2026-03-02T19:27:15Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-03-02T19:27:28Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring of the AWQ (Activation-aware Weight Quantization) modifier. The core logic and various helper functions previously residing within the base.py file have been extracted and modularized into a new helpers.py module. This change aims to enhance the clarity, maintainability, and overall structure of the AWQ implementation by separating concerns and reducing the complexity of the main modifier class.

Highlights

Code Reorganization: Numerous helper functions and complex logic were extracted from the AWQModifier class in base.py into a new helpers.py module, significantly improving modularity.
Improved Readability and Maintainability: The AWQModifier class in base.py is now cleaner and easier to understand, as core methods delegate to well-named, encapsulated helper functions.
Modularization of AWQ Logic: Specific functionalities such as mapping resolution, activation caching, smoothing application, and best scale computation are now encapsulated in dedicated helper functions within the new helpers.py file.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/llmcompressor/modifiers/awq/base.py
- Removed various utility and quantization-related imports, replacing them with imports from the new helpers.py module.
- Refactored _set_resolved_mappings to delegate mapping resolution logic to new helper functions.
- Refactored _setup_activation_cache_hooks to utilize new private registration methods and helper functions for activation caching.
- Refactored _apply_smoothing to delegate the smoothing application logic to new private methods and helper functions.
- Refactored _compute_best_scale to delegate the grid search and scale selection logic to new helper functions.
- Introduced new private methods such as _get_targeted_names, _resolve_mapping_instances, _register_parent_cache_hook, _register_smooth_activation_hook, _cache_parent_kwargs_hook, _create_cache_smooth_activations_hook_fn, _get_mappings_with_cached_activations, _apply_smoothing_to_mapping, _apply_scales_to_mapping, _get_grid_search_means, and _get_grid_configuration to encapsulate specific functionalities.
- Removed several standalone helper functions that were moved to helpers.py.
src/llmcompressor/modifiers/awq/helpers.py
- Added a new file to house numerous helper functions extracted from base.py.
- Included functions like validate_and_get_smooth_layer, flatten_balance_layers, get_mapping_skip_reason, resolve_activation_hook_target, extract_masked_activations, should_skip_smoothing_for_outputs, apply_scale_to_module, get_balance_layers_with_weight_quantization, create_memoryless_weight_observers, compute_scales_for_ratio, apply_quantized_balance_weights, apply_tensor_group_fusion_if_needed, compute_scale_losses, select_best_scales_from_losses, _check_layers_are_compatible, get_lowest_common_ancestor_with_avoid, and accumulate_mean.

Activity

The pull request is currently marked as 'Work In Progress' (WIP).
The summary and test plan sections in the pull request description are placeholders, indicating they are yet to be filled out by the author.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

mergify · 2026-03-02T19:28:37Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring of the AWQModifier. The logic has been broken down into smaller, more manageable methods, and many helper functions have been extracted into a new src/llmcompressor/modifiers/awq/helpers.py file. This greatly improves the readability and maintainability of the code. My review includes a couple of suggestions for improvement in the new helper functions.

gemini-code-assist · 2026-03-02T19:29:26Z

src/llmcompressor/modifiers/awq/helpers.py

+            leave=False,
+        )
+        for grid_idx, use_duo_scaling in pbar:
+            ratio = grid_idx / n_grid


The grid search for the ratio α is stated in the AWQ paper to be in the range [0, 1]. The current implementation generates ratios from 0 up to (n_grid - 1) / n_grid, so it never reaches 1.0. This excludes a potentially optimal value from the search space. To cover the full range, you could adjust the ratio calculation to generate n_grid evenly spaced points between 0.0 and 1.0, inclusive.

Suggested change

ratio = grid_idx / n_grid

ratio = grid_idx / (n_grid - 1) if n_grid > 1 else 0.0

gemini-code-assist · 2026-03-02T19:29:26Z

src/llmcompressor/modifiers/awq/helpers.py

+        raise Exception(
+            "No finite loss was found in best scalesgrid search. This typically "
+            "means NaN values are appearing in the forward pass of the parent "
+            "module. If you encounter this error, raise an issue at "
+            "https://github.com/vllm-project/llm-compressor/issues"
+        )


There's a typo in the exception message. "scalesgrid" should be "scales grid". Fixing this will improve clarity for users who encounter this error.

Suggested change

raise Exception(

"No finite loss was found in best scalesgrid search. This typically "

"means NaN values are appearing in the forward pass of the parent "

"module. If you encounter this error, raise an issue at "

"https://github.com/vllm-project/llm-compressor/issues"

)

raise Exception(

"No finite loss was found in best scales grid search. This typically "

"means NaN values are appearing in the forward pass of the parent "

"module. If you encounter this error, raise an issue at "

"https://github.com/vllm-project/llm-compressor/issues"

)

kylesayrs added 2 commits March 2, 2026 14:16

codex

a885b1c

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

break out mse from scales

9f039e8

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

mergify bot added the quality-failed label Mar 2, 2026

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

kylesayrs closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] AWQ refactor#2428

[WIP] AWQ refactor#2428
kylesayrs wants to merge 2 commits intomainfrom
kylesayrs/awq-refactor-base

kylesayrs commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 2, 2026

Uh oh!

gemini-code-assist bot Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	ratio = grid_idx / n_grid
	ratio = grid_idx / (n_grid - 1) if n_grid > 1 else 0.0

Conversation

kylesayrs commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant