Skip to content

Add Intel AutoRound algorithm support#8

Closed
yiliu30 wants to merge 33 commits intomainfrom
up-ar
Closed

Add Intel AutoRound algorithm support#8
yiliu30 wants to merge 33 commits intomainfrom
up-ar

Conversation

@yiliu30
Copy link
Owner

@yiliu30 yiliu30 commented Nov 3, 2025

Address [todo: issue number]

Highlights

  • Introduced AutoRoundModifier to enable AutoRound quantization for wNa16.
  • Added an end-to-end example and unit tests.
  • Verified functionality with local accuracy tests (GSM8K with a limit of 1000, the results may fluctuate due to non-determinism.)
- AutoRound result as ref
vllm (pretrained=/storage/yiliu7/meta-llama/Meta-Llama-3-8B-Instruct-ar/Meta-Llama-3-8B-Instruct-w4g128/,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match||0.739|±  |0.0139|
|     |       |strict-match    |     5|exact_match||0.740|±  |0.0139|

LLMC-AutoRound
vllm (pretrained=/storage/yiliu7/Meta-Llama-3-8B-Instruct-W4A16-G128-disbale-shuffule,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match||0.737|±  |0.0139|
|     |       |strict-match    |     5|exact_match||0.736|±  |0.0139|

LLMC-GPTQ
vllm (pretrained=Meta-Llama-3-8B-Instruct-W4A16-G128-GPTQ,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match||0.723|±  |0.0142|
|     |       |strict-match    |     5|exact_match||0.723|±  |0.0142|

Next stage (in later PRs)

  • Extend support for additional data types.
  • Add group-wise quantization recipes mapping between LLMC and AutoRound.
  • Add end-to-end tests.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@github-actions
Copy link

github-actions bot commented Nov 3, 2025

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 marked this pull request as ready for review November 3, 2025 11:20
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 3, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for AutoRound quantization by adding a new AutoRoundModifier, an example script, and corresponding tests. My review focuses on improving the portability and clarity of the new example script and addressing several issues within the AutoRoundModifier implementation. Key feedback includes removing hardcoded paths and values, generalizing model-specific logic, and completing the handling of quantization parameters to ensure correctness and reusability.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 3, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the AutoRound quantization algorithm, including a new AutoRoundModifier, an example script, and tests. The core implementation is a great start, but there are several areas that need improvement for robustness and usability. Specifically, the example scripts contain hardcoded paths and development artifacts that should be cleaned up. The AutoRoundModifier implementation has hardcoded values for the quantization scheme and device, which limits its flexibility. There is also a critical TODO regarding weight updates that needs to be addressed to ensure correctness. My review includes suggestions to address these points.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 5, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for AutoRound quantization by adding an AutoRoundModifier. The changes include the modifier implementation, a new dependency, an example script, and tests. Overall, the implementation is solid, but there are a few issues to address. The example scripts contain hardcoded paths and typos that should be cleaned up. More importantly, the AutoRoundModifier implementation has a potential bug in handling multiple configuration groups and appears to be missing logic for updating the weight_zero_point after quantization. I've provided specific comments and suggestions to address these points.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 5, 2025

/gemini summary

@gemini-code-assist
Copy link

Summary of Changes

This pull request integrates the AutoRound quantization algorithm into the llmcompressor framework, providing a new modifier for users to apply this method. The core implementation of the AutoRoundModifier is included, leveraging the auto_round library for 4-bit weight quantization. The changes also encompass an example script demonstrating its usage with a Llama 3 model, along with corresponding tests, and updates to the system's handling of quantization-aware calibration to properly support the new modifier.

Highlights

  • New AutoRound Quantization Modifier: Introduced the AutoRoundModifier to enable the AutoRound quantization algorithm within the llmcompressor framework, allowing for 4-bit weight quantization.
  • Integration of auto_round Library: The auto_round library, specifically from a GitHub branch, has been added as a new dependency to support the AutoRound functionality.
  • Example and Testing: A new example script demonstrates how to apply AutoRound quantization to a Llama 3 8B Instruct model, and a dedicated test suite verifies the modifier's functionality with a TinyLlama model.
  • Quantization-Aware Calibration Handling: The AutoRoundModifier has been added to the list of modifiers that disable quantization-aware calibration in sequential pipelines, ensuring compatibility with the AutoRound process.
Changelog
  • examples/quantization_w4a16/auto_round_llama3_example.py
    • Added a new example script showcasing AutoRound 4-bit weight quantization for the Llama 3 8B Instruct model.
  • setup.py
    • Added the auto_round library as a dependency, pointing to a specific GitHub branch.
  • src/llmcompressor/modifiers/quantization/init.py
    • Updated to export the newly added autoround module.
  • src/llmcompressor/modifiers/quantization/autoround/init.py
    • Added an __init__.py file for the new autoround package.
  • src/llmcompressor/modifiers/quantization/autoround/base.py
    • Implemented the AutoRoundModifier class, which orchestrates the layer-by-layer application of the AutoRound quantization algorithm.
  • src/llmcompressor/pipelines/layer_sequential/pipeline.py
    • Modified to include AutoRoundModifier in the list of modifiers that disable quantization-aware calibration.
  • src/llmcompressor/pipelines/sequential/pipeline.py
    • Modified to include AutoRoundModifier in the list of modifiers that disable quantization-aware calibration.
  • src/llmcompressor/utils/helpers.py
    • Defined DISABLE_QAC_MODIFIERS and added AutoRoundModifier to this list.
  • tests/llmcompressor/transformers/autoround/test_oneshot.py
    • Added new tests to validate the oneshot application of AutoRoundModifier on a TinyLlama model.
Activity
  • The pull request was initiated by yiliu30.
  • A bot requested the 'ready' label for review.
  • yiliu30 requested Gemini reviews multiple times.
  • Gemini Code Assist provided several review comments, highlighting critical issues such as hardcoded model structure access, hardcoded arguments for auto_round.AutoRound, and incomplete handling of quantization parameters.
  • High-priority feedback included issues with hardcoded device settings, the need for assertions for single config groups, and clarification on weight updates.
  • Medium-priority comments addressed confusing example scripts with hardcoded paths and overwritten variables, a fragile _is_decoding_layer function, incorrect docstrings, local auto_round imports without error handling, and the use of a specific GitHub branch for a dependency.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 changed the title Up ar Add Intel AutoRound algorithm support Nov 5, 2025
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 5, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Intel AutoRound quantization algorithm, which is a valuable addition. The implementation includes the core AutoRoundModifier, an end-to-end example, and unit tests. The code is generally well-structured, but I've identified a few areas for improvement related to correctness and robustness, particularly concerning the handling of quantization parameters and module attribute modifications.

Comment on lines +134 to +135
for name, mod in state.model.named_modules():
mod._tmp_name = name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding a temporary attribute _tmp_name to every module in the model can be fragile and may cause attribute name collisions. A more robust approach would be to avoid modifying the modules directly. For instance, you could use a dictionary that maps id(module) to its name, or use a closure in the hook registration to capture the module name. This would make the implementation cleaner and less intrusive.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30
Copy link
Owner Author

yiliu30 commented Nov 13, 2025

1994
Formal PR was merged, close it

@yiliu30 yiliu30 closed this Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants