Keep quantization enabled during calibration by kylesayrs · Pull Request #1299 · vllm-project/llm-compressor

kylesayrs · 2025-03-29T15:12:47Z

Purpose

Revert the behavior regression introduced as a result of SQ and QM: Remove torch.cuda.empty_cache, use calibration_forward_context #1114
When calibrating a model using the QuantizationModifier, quantization should be enabled when calibrating

Changes

Remove "disabling quantization" from the calibration forward pass
Add "disabling quantization" to the sequential pipelines in order to continue to disable quantization during calibration for GPTQ and SGPT
- When calibration pipelines become shared between modifiers, the decision of whether to disabling quantization during calibration will have to be moved to the calibration pipelines themselves. Some work needs to be done to demonstrate that GPTQ and SGPT do not suffer accuracy regression from enabling activation quantization during calibration (in theory, the change should increase accuracy)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

github-actions · 2025-03-29T15:12:57Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

dsikka

Thanks, we should get this in for the next release.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

markurtz · 2025-03-31T17:01:16Z

If this fixes for release, that's great, but for the future, I think I'm missing some underlying assumptions here on why this is needed. I'm worried about general fragility for these and clarity of the code overall. Why do we need to disable completely and the modifiers aren't handling that logic properly?

src/llmcompressor/pipelines/layer_sequential/pipeline.py

kylesayrs · 2025-03-31T17:10:36Z

@markurtz Please see my comment in the description

the decision of whether to disabling quantization during calibration will have to be moved to the calibration pipelines themselves. Some work needs to be done to demonstrate that GPTQ and SGPT do not suffer accuracy regression from enabling activation quantization during calibration (in theory, the change should increase accuracy)

As currently implemented, activation quantization is enabled while calibrating with QuantizationModifier, but is not enabled when calibrating with GPTQModifier, and SGPT modifiers. In the future, we hope to demonstrate that enabling activation quantization while calibrating GPTQ and SGPT leads to accuracy improvements, not regressions

markurtz · 2025-03-31T17:15:00Z

As currently implemented, activation quantization is enabled while calibrating with QuantizationModifier, but is not enabled when calibrating with GPTQModifier, and SGPT modifiers. In the future, we hope to demonstrate that enabling activation quantization while calibrating GPTQ and SGPT leads to accuracy improvements, not regressions

Yep, that's all fine. My main concern is around the pipelines needing to know about disabling quantization within a context and that not being handled at the modifier level. We generally should never be setting up code where the pipeline is either specific to a compression pathway or needs to know about the contents of the recipes / modifiers. Leads to a ton of issues especially towards generality of recipes and implementations along with future support.

remove disabling quantization during calibration

8b01785

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

dsikka reviewed Mar 30, 2025

View reviewed changes

fix test, disable quantization for sequential pipelines

843b262

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs added the ready When a PR is ready for review label Mar 31, 2025

kylesayrs marked this pull request as ready for review March 31, 2025 14:37

kylesayrs changed the title ~~[WIP] Keep quantization enabled during calibration~~ Keep quantization enabled during calibration Mar 31, 2025

rahul-tuli approved these changes Mar 31, 2025

View reviewed changes

brian-dellabetta approved these changes Mar 31, 2025

View reviewed changes

src/llmcompressor/pipelines/layer_sequential/pipeline.py Show resolved Hide resolved

dsikka approved these changes Mar 31, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/remove-quantization-disable

836c13a

dsikka enabled auto-merge (squash) April 1, 2025 15:48

dsikka merged commit 5a77b59 into main Apr 1, 2025
8 checks passed

dsikka deleted the kylesayrs/remove-quantization-disable branch April 1, 2025 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep quantization enabled during calibration#1299

Keep quantization enabled during calibration#1299
dsikka merged 3 commits intomainfrom
kylesayrs/remove-quantization-disable

kylesayrs commented Mar 29, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 29, 2025

Uh oh!

dsikka left a comment

Uh oh!

markurtz commented Mar 31, 2025

Uh oh!

Uh oh!

kylesayrs commented Mar 31, 2025

Uh oh!

markurtz commented Mar 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kylesayrs commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Uh oh!

github-actions bot commented Mar 29, 2025

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

markurtz commented Mar 31, 2025

Uh oh!

Uh oh!

kylesayrs commented Mar 31, 2025

Uh oh!

markurtz commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kylesayrs commented Mar 29, 2025 •

edited

Loading

markurtz commented Mar 31, 2025 •

edited

Loading