Skip to content

Keep quantization enabled during calibration#1299

Merged
dsikka merged 3 commits intomainfrom
kylesayrs/remove-quantization-disable
Apr 1, 2025
Merged

Keep quantization enabled during calibration#1299
dsikka merged 3 commits intomainfrom
kylesayrs/remove-quantization-disable

Conversation

@kylesayrs
Copy link
Copy Markdown
Collaborator

@kylesayrs kylesayrs commented Mar 29, 2025

Purpose

Changes

  • Remove "disabling quantization" from the calibration forward pass
  • Add "disabling quantization" to the sequential pipelines in order to continue to disable quantization during calibration for GPTQ and SGPT
    • When calibration pipelines become shared between modifiers, the decision of whether to disabling quantization during calibration will have to be moved to the calibration pipelines themselves. Some work needs to be done to demonstrate that GPTQ and SGPT do not suffer accuracy regression from enabling activation quantization during calibration (in theory, the change should increase accuracy)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Copy Markdown
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, we should get this in for the next release.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs added the ready When a PR is ready for review label Mar 31, 2025
@kylesayrs kylesayrs marked this pull request as ready for review March 31, 2025 14:37
@kylesayrs kylesayrs changed the title [WIP] Keep quantization enabled during calibration Keep quantization enabled during calibration Mar 31, 2025
@markurtz
Copy link
Copy Markdown
Collaborator

If this fixes for release, that's great, but for the future, I think I'm missing some underlying assumptions here on why this is needed. I'm worried about general fragility for these and clarity of the code overall. Why do we need to disable completely and the modifiers aren't handling that logic properly?

@kylesayrs
Copy link
Copy Markdown
Collaborator Author

@markurtz Please see my comment in the description

the decision of whether to disabling quantization during calibration will have to be moved to the calibration pipelines themselves. Some work needs to be done to demonstrate that GPTQ and SGPT do not suffer accuracy regression from enabling activation quantization during calibration (in theory, the change should increase accuracy)

As currently implemented, activation quantization is enabled while calibrating with QuantizationModifier, but is not enabled when calibrating with GPTQModifier, and SGPT modifiers. In the future, we hope to demonstrate that enabling activation quantization while calibrating GPTQ and SGPT leads to accuracy improvements, not regressions

@markurtz
Copy link
Copy Markdown
Collaborator

markurtz commented Mar 31, 2025

As currently implemented, activation quantization is enabled while calibrating with QuantizationModifier, but is not enabled when calibrating with GPTQModifier, and SGPT modifiers. In the future, we hope to demonstrate that enabling activation quantization while calibrating GPTQ and SGPT leads to accuracy improvements, not regressions

Yep, that's all fine. My main concern is around the pipelines needing to know about disabling quantization within a context and that not being handled at the modifier level. We generally should never be setting up code where the pipeline is either specific to a compression pathway or needs to know about the contents of the recipes / modifiers. Leads to a ton of issues especially towards generality of recipes and implementations along with future support.

@dsikka dsikka enabled auto-merge (squash) April 1, 2025 15:48
@dsikka dsikka merged commit 5a77b59 into main Apr 1, 2025
8 checks passed
@dsikka dsikka deleted the kylesayrs/remove-quantization-disable branch April 1, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants