Skip to content

[WIP] AWQ refactor#2428

Closed
kylesayrs wants to merge 2 commits intomainfrom
kylesayrs/awq-refactor-base
Closed

[WIP] AWQ refactor#2428
kylesayrs wants to merge 2 commits intomainfrom
kylesayrs/awq-refactor-base

Conversation

@kylesayrs
Copy link
Copy Markdown
Collaborator

SUMMARY:
"please provide a brief summary"

TEST PLAN:
"please outline how the changes were tested"

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 2, 2026

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring of the AWQ (Activation-aware Weight Quantization) modifier. The core logic and various helper functions previously residing within the base.py file have been extracted and modularized into a new helpers.py module. This change aims to enhance the clarity, maintainability, and overall structure of the AWQ implementation by separating concerns and reducing the complexity of the main modifier class.

Highlights

  • Code Reorganization: Numerous helper functions and complex logic were extracted from the AWQModifier class in base.py into a new helpers.py module, significantly improving modularity.
  • Improved Readability and Maintainability: The AWQModifier class in base.py is now cleaner and easier to understand, as core methods delegate to well-named, encapsulated helper functions.
  • Modularization of AWQ Logic: Specific functionalities such as mapping resolution, activation caching, smoothing application, and best scale computation are now encapsulated in dedicated helper functions within the new helpers.py file.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/llmcompressor/modifiers/awq/base.py
    • Removed various utility and quantization-related imports, replacing them with imports from the new helpers.py module.
    • Refactored _set_resolved_mappings to delegate mapping resolution logic to new helper functions.
    • Refactored _setup_activation_cache_hooks to utilize new private registration methods and helper functions for activation caching.
    • Refactored _apply_smoothing to delegate the smoothing application logic to new private methods and helper functions.
    • Refactored _compute_best_scale to delegate the grid search and scale selection logic to new helper functions.
    • Introduced new private methods such as _get_targeted_names, _resolve_mapping_instances, _register_parent_cache_hook, _register_smooth_activation_hook, _cache_parent_kwargs_hook, _create_cache_smooth_activations_hook_fn, _get_mappings_with_cached_activations, _apply_smoothing_to_mapping, _apply_scales_to_mapping, _get_grid_search_means, and _get_grid_configuration to encapsulate specific functionalities.
    • Removed several standalone helper functions that were moved to helpers.py.
  • src/llmcompressor/modifiers/awq/helpers.py
    • Added a new file to house numerous helper functions extracted from base.py.
    • Included functions like validate_and_get_smooth_layer, flatten_balance_layers, get_mapping_skip_reason, resolve_activation_hook_target, extract_masked_activations, should_skip_smoothing_for_outputs, apply_scale_to_module, get_balance_layers_with_weight_quantization, create_memoryless_weight_observers, compute_scales_for_ratio, apply_quantized_balance_weights, apply_tensor_group_fusion_if_needed, compute_scale_losses, select_best_scales_from_losses, _check_layers_are_compatible, get_lowest_common_ancestor_with_avoid, and accumulate_mean.
Activity
  • The pull request is currently marked as 'Work In Progress' (WIP).
  • The summary and test plan sections in the pull request description are placeholders, indicating they are yet to be filled out by the author.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 2, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the AWQModifier. The logic has been broken down into smaller, more manageable methods, and many helper functions have been extracted into a new src/llmcompressor/modifiers/awq/helpers.py file. This greatly improves the readability and maintainability of the code. My review includes a couple of suggestions for improvement in the new helper functions.

leave=False,
)
for grid_idx, use_duo_scaling in pbar:
ratio = grid_idx / n_grid
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The grid search for the ratio α is stated in the AWQ paper to be in the range [0, 1]. The current implementation generates ratios from 0 up to (n_grid - 1) / n_grid, so it never reaches 1.0. This excludes a potentially optimal value from the search space. To cover the full range, you could adjust the ratio calculation to generate n_grid evenly spaced points between 0.0 and 1.0, inclusive.

Suggested change
ratio = grid_idx / n_grid
ratio = grid_idx / (n_grid - 1) if n_grid > 1 else 0.0

Comment on lines +330 to +335
raise Exception(
"No finite loss was found in best scalesgrid search. This typically "
"means NaN values are appearing in the forward pass of the parent "
"module. If you encounter this error, raise an issue at "
"https://github.com/vllm-project/llm-compressor/issues"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the exception message. "scalesgrid" should be "scales grid". Fixing this will improve clarity for users who encounter this error.

Suggested change
raise Exception(
"No finite loss was found in best scalesgrid search. This typically "
"means NaN values are appearing in the forward pass of the parent "
"module. If you encounter this error, raise an issue at "
"https://github.com/vllm-project/llm-compressor/issues"
)
raise Exception(
"No finite loss was found in best scales grid search. This typically "
"means NaN values are appearing in the forward pass of the parent "
"module. If you encounter this error, raise an issue at "
"https://github.com/vllm-project/llm-compressor/issues"
)

@kylesayrs kylesayrs closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant