[Multi-modifier] Support scoped application of quantization config/status #1772

brian-dellabetta · 2025-08-21T19:52:09Z

SUMMARY:
Prerequisites:

[Multi-Modifier] Scoped apply quantization config neuralmagic/compressed-tensors#432

This allows for multi-modifier support by scoping the application of quantization config/status to only the modules in the model that match the given targets/ignore configuration, rather than all modules. Initialization of observers is moved to on_start (instead of on_initialize) to match their removal on_end (and not on_finalize). This prevents collision during the multi-modifier lifecycle

TEST PLAN:

Tests were added to [Multi-Modifier] Scoped apply quantization config neuralmagic/compressed-tensors#432 to confirm correct application of multiple modifiers.
Added an example in this PR to show how AWQ and GPTQ can be applied heterogeneously to a model, along with a small README. Logs show alternating AWQ and GPTQ messages for "sequential", and correct behavior for "independent" pipelines. Model checkpoint for the sequential pipeline shows correct application of W8A8 to self_attn layers and W4A16 to mlp layers. config.json and safetensors weights all look as expected

github-actions · 2025-08-21T19:52:19Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Brian Dellabetta <[email protected]>

src/llmcompressor/modifiers/quantization/gptq/base.py

Signed-off-by: Brian Dellabetta <[email protected]>

kylesayrs

Consider adding some basic tests/ common use cases, otherwise looks good!

examples/multi_modifier/llama3_example.py

Signed-off-by: Brian Dellabetta <[email protected]>

kylesayrs

Nice job

examples/mixed_precision/llama3_example.py

Signed-off-by: Brian Dellabetta <[email protected]>

shanjiaz

Woohoo!

fynnsu

Looks good!

dsikka · 2025-09-22T21:20:10Z

tests/llmcompressor/modifiers/quantization/test_base.py

 def test_serialize_actorder(has_actorder, actorder, exp_actorder):
    if has_actorder:
-        modifier = GPTQModifier(targets=["Linear"], actorder=actorder)
+        modifier = GPTQModifier(targets=["Linear"], scheme="W8A8", actorder=actorder)


How was this targeting before you added the scheme?

I think it just passed init/validation but was never used. It was never applied to a model, so it would've never worked, i just added it to make sure improper configuration wasn't the reason the test was failing (it was ultimately something else causing the test to fail)

dsikka · 2025-09-22T21:28:54Z

src/llmcompressor/modifiers/quantization/quantization/mixin.py

        self._calibration_hooks = self._initialize_hooks(model)
-        model.apply(apply_calibration_status)
+        for _, module in match_named_modules(model, self.targets, self.ignore):
+            self._initialize_observers(module)


Why can't we keep this iniitialize_quantization?

observers should be initialized on start to align with them being removed on_end. so this was moved into on_start instead. without this change the lifecycle with multiple quant modifiers will trigger observer hooks before the modifier starts (before it sees any data), which can now happen in a previous modifier lifecycle

…les (#1869) SUMMARY: #1772 introduced a bug when running NVFP4 quantization schemes. The call to `update_fused_layer_weight_global_scales` needs to be run on Attention and MLP layers, which are not included in `targets` consisting of quantizable layers inside Attention/MLP. This PR fixes that by running `update_fused_layer_weight_global_scales` on every module instead of the targeted ones, which is ok because the call is idempotent and will only modify if the modules have NVFP4 schemes. This is only a problem in `QuantizationModifier`, AWQ cannot be used with NVFP4. TEST PLAN: Confirmed that the working vs. broken global scales are mismatched because the update is never run: ``` model.layers.0.self_attn.k_proj.weight_global_scale -- working 9600.0, broken 12992.0 model.layers.0.self_attn.q_proj.weight_global_scale -- working 9600.0, broken 9600.0 model.layers.0.self_attn.v_proj.weight_global_scale -- working 9600.0, broken 12160.0 ``` And these changes resolve the regression: Before ``` vllm (pretrained=/home/dsikka/llm-compressor/examples/quantization_w4a4_fp4/Qwen3-30B-A3B-NVFP4,dtype=auto,max_model_len=4096,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8135|± |0.0107| | | |strict-match | 5|exact_match|↑ |0.8097|± |0.0108| ``` After ``` vllm (pretrained=/home/brian-dellabetta/projects/llm-compressor/Qwen3-30B-A3B-NVFP4,dtype=auto,max_model_len=4096,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8620|± |0.0095| | | |strict-match | 5|exact_match|↑ |0.8575|± |0.0096| ``` --------- Signed-off-by: Brian Dellabetta <[email protected]>

SUMMARY: This PR - [x] Removes `pile-val-dataset` from e2e tests, as it is no longer used in examples and the processing logic was flawed - [x] Fixes a model validation error introduced in #1772 that was preventing AWQModifier from running one of the validations, causing it to be in an invalid state (`AWQModifier.validate_model_after` was preventing `QuantizationMixin.validate_model_after` from running). With these changes, tests pass and the compressed model generates meaningful responses. It was previously generating all 0s TEST PLAN: `CADENCE=nightly TEST_DATA_FILE=tests/e2e/vLLM/configs/w4a16_grouped_quant_sym_awq.yaml pytest -s tests/e2e/vLLM/test_vllm.py` and `CADENCE=nightly TEST_DATA_FILE=tests/e2e/vLLM/configs/w4a16_grouped_quant_asym_awq.yaml pytest -s tests/e2e/vLLM/test_vllm.py` both pass with output like ``` PROMPT: The capital of France is GENERATED TEXT: Paris, which is also the country's largest city. PROMPT: The president of the US is GENERATED TEXT: named, but the name of the Vice President is not given. In the case PROMPT: My name is GENERATED TEXT: Emily and I am from Canada. I have always been fascinated with ``` --------- Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta requested review from dsikka, kylesayrs and rahul-tuli August 21, 2025 19:55

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch 2 times, most recently from 5fec983 to 2f93072 Compare August 28, 2025 16:51

brian-dellabetta changed the title ~~[Multi-modifier] Support scoped appliation of quantization config/status~~ [Multi-modifier] Support scoped application of quantization config/status Sep 2, 2025

brian-dellabetta added 4 commits September 11, 2025 16:43

match_named_modules, add observer on_start instead of on_initialize

9f0e0ac

Signed-off-by: Brian Dellabetta <[email protected]>

scoped quant status/config

14486af

Signed-off-by: Brian Dellabetta <[email protected]>

scoped GPTQModifier

ff5067a

Signed-off-by: Brian Dellabetta <[email protected]>

style fixes

f99db2f

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch from 2f93072 to f99db2f Compare September 11, 2025 16:43

kylesayrs reviewed Sep 15, 2025

View reviewed changes

src/llmcompressor/modifiers/quantization/gptq/base.py Show resolved Hide resolved

brian-dellabetta mentioned this pull request Sep 15, 2025

[Multi-Modifier] Scoped apply quantization config neuralmagic/compressed-tensors#432

Merged

brian-dellabetta added 2 commits September 15, 2025 20:30

multi-modifier example

5da7b6d

Signed-off-by: Brian Dellabetta <[email protected]>

revert assert check in GPTQ

32ad8dc

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta marked this pull request as ready for review September 15, 2025 20:38

Merge branch 'main' into bdellabe/scoped-quant-status

faee70f

brian-dellabetta added the ready When a PR is ready for review label Sep 15, 2025

stylefix examples

4db397b

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta removed the ready When a PR is ready for review label Sep 15, 2025

kylesayrs reviewed Sep 15, 2025

View reviewed changes

examples/multi_modifier/llama3_example.py Show resolved Hide resolved

examples/multi_modifier/llama3_example.py Show resolved Hide resolved

examples/multi_modifier/llama3_example.py Outdated Show resolved Hide resolved

brian-dellabetta and others added 6 commits September 16, 2025 16:57

Merge branch 'main' into bdellabe/scoped-quant-status

1c7ae4d

KVCacheScaleType import update

64f8f39

Signed-off-by: Brian Dellabetta <[email protected]>

Merge branch 'main' into bdellabe/scoped-quant-status

a892d2b

codereview multi_modifier -> mixed_precision

1d3eceb

Signed-off-by: Brian Dellabetta <[email protected]>

saved model name

75c7ca6

Signed-off-by: Brian Dellabetta <[email protected]>

GPTQ validation layer

81cf4a1

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta requested a review from kylesayrs September 18, 2025 17:09

kylesayrs previously approved these changes Sep 18, 2025

View reviewed changes

dsikka reviewed Sep 19, 2025

View reviewed changes

examples/mixed_precision/llama3_example.py Show resolved Hide resolved

dsikka reviewed Sep 19, 2025

View reviewed changes

examples/mixed_precision/llama3_example.py Show resolved Hide resolved

brian-dellabetta added 3 commits September 19, 2025 14:04

merge main

1a6eca7

Signed-off-by: Brian Dellabetta <[email protected]>

move exampe to quantization_non_uniform

50fbf15

Signed-off-by: Brian Dellabetta <[email protected]>

QuantizationMixin targets resolution

a0568f7

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta requested review from dsikka, fynnsu, kylesayrs and shanjiaz September 19, 2025 18:58

brian-dellabetta added 11 commits September 19, 2025 19:04

style fixes

5e5e0fe

Signed-off-by: Brian Dellabetta <[email protected]>

quant mixin updates

6cd0350

Signed-off-by: Brian Dellabetta <[email protected]>

Merge branch 'main' into bdellabe/scoped-quant-status

a2377d9

Quant mixin targets validation

1619337

Signed-off-by: Brian Dellabetta <[email protected]>

remove extraneous awq changes

e21a933

Signed-off-by: Brian Dellabetta <[email protected]>

move validation out of resolve quantization config

c437c6f

Signed-off-by: Brian Dellabetta <[email protected]>

remove validation error

4326ee3

Signed-off-by: Brian Dellabetta <[email protected]>

moved quant config call

2611ac6

Signed-off-by: Brian Dellabetta <[email protected]>

retain validation error

2ea5698

Signed-off-by: Brian Dellabetta <[email protected]>

don't call resolve config in validation layer

33695d5

Signed-off-by: Brian Dellabetta <[email protected]>

minor refactor for when model.config_groups is None

170f04b

Signed-off-by: Brian Dellabetta <[email protected]>

shanjiaz approved these changes Sep 22, 2025

View reviewed changes

fynnsu approved these changes Sep 22, 2025

View reviewed changes

brian-dellabetta merged commit 27303c4 into main Sep 22, 2025
8 checks passed

brian-dellabetta deleted the bdellabe/scoped-quant-status branch September 22, 2025 21:25

dsikka reviewed Sep 22, 2025

View reviewed changes

brian-dellabetta mentioned this pull request Sep 26, 2025

[QuantizationModifier] NVFP4 bugfix -- fused layer update on all modules #1869

Merged

brian-dellabetta mentioned this pull request Sep 29, 2025

[tests] remove pile-val-backup dataset from tests #1879

Merged

2 tasks

brian-dellabetta mentioned this pull request Oct 8, 2025

[Bug]: Serialized QuantizationMixin subclasses fail to load #1906

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Multi-modifier] Support scoped application of quantization config/status #1772

[Multi-modifier] Support scoped application of quantization config/status #1772

Uh oh!

brian-dellabetta commented Aug 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

Uh oh!

shanjiaz left a comment

Uh oh!

fynnsu left a comment

Uh oh!

Uh oh!

dsikka Sep 22, 2025

Uh oh!

brian-dellabetta Sep 22, 2025

Uh oh!

dsikka Sep 22, 2025

Uh oh!

brian-dellabetta Sep 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Multi-modifier] Support scoped application of quantization config/status #1772

[Multi-modifier] Support scoped application of quantization config/status #1772

Uh oh!

Conversation

brian-dellabetta commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shanjiaz left a comment

Choose a reason for hiding this comment

Uh oh!

fynnsu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsikka Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

dsikka Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brian-dellabetta commented Aug 21, 2025 •

edited

Loading

brian-dellabetta Sep 22, 2025 •

edited

Loading