Keep quantization enabled during calibration#1299
Conversation
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
dsikka
left a comment
There was a problem hiding this comment.
Thanks, we should get this in for the next release.
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
If this fixes for release, that's great, but for the future, I think I'm missing some underlying assumptions here on why this is needed. I'm worried about general fragility for these and clarity of the code overall. Why do we need to disable completely and the modifiers aren't handling that logic properly? |
|
@markurtz Please see my comment in the description
As currently implemented, activation quantization is enabled while calibrating with |
Yep, that's all fine. My main concern is around the pipelines needing to know about disabling quantization within a context and that not being handled at the modifier level. We generally should never be setting up code where the pipeline is either specific to a compression pathway or needs to know about the contents of the recipes / modifiers. Leads to a ton of issues especially towards generality of recipes and implementations along with future support. |
Purpose
torch.cuda.empty_cache, usecalibration_forward_context#1114QuantizationModifier, quantization should be enabled when calibratingChanges