You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[QuantizationModifier] NVFP4 bugfix -- fused layer update on all modules (#1869)
SUMMARY:
#1772 introduced a bug when running NVFP4 quantization schemes. The call
to `update_fused_layer_weight_global_scales` needs to be run on
Attention and MLP layers, which are not included in `targets` consisting
of quantizable layers inside Attention/MLP. This PR fixes that by
running `update_fused_layer_weight_global_scales` on every module
instead of the targeted ones, which is ok because the call is idempotent
and will only modify if the modules have NVFP4 schemes. This is only a
problem in `QuantizationModifier`, AWQ cannot be used with NVFP4.
TEST PLAN:
Confirmed that the working vs. broken global scales are mismatched
because the update is never run:
```
model.layers.0.self_attn.k_proj.weight_global_scale -- working 9600.0, broken 12992.0
model.layers.0.self_attn.q_proj.weight_global_scale -- working 9600.0, broken 9600.0
model.layers.0.self_attn.v_proj.weight_global_scale -- working 9600.0, broken 12160.0
```
And these changes resolve the regression:
Before
```
vllm (pretrained=/home/dsikka/llm-compressor/examples/quantization_w4a4_fp4/Qwen3-30B-A3B-NVFP4,dtype=auto,max_model_len=4096,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8135|± |0.0107|
| | |strict-match | 5|exact_match|↑ |0.8097|± |0.0108|
```
After
```
vllm (pretrained=/home/brian-dellabetta/projects/llm-compressor/Qwen3-30B-A3B-NVFP4,dtype=auto,max_model_len=4096,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8620|± |0.0095|
| | |strict-match | 5|exact_match|↑ |0.8575|± |0.0096|
```
---------
Signed-off-by: Brian Dellabetta <[email protected]>
0 commit comments