[Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params #407

dsikka · 2025-07-28T21:55:45Z

Add compression info for nvfp4 to support decompression with CompressedLinear
Update to support batch_size > 1 when running QDQ for tensor_group / group dynamic activations - this is done by updatng the reshape command to be generic such that all dimensions are maintained apart from the last / group_dim which has to be reshaped for QDQ

kylesayrs

If this doesn't support 4d activations, you should add asserts making sure that the ndims matches expectations

src/compressed_tensors/quantization/utils/helpers.py

brian-dellabetta

We should also get in

#393

another dynamic quant fix

rahul-tuli

LGTM!

src/compressed_tensors/quantization/lifecycle/forward.py

src/compressed_tensors/quantization/utils/helpers.py

kylesayrs

I need to understand this first, sorry

kylesayrs · 2025-07-31T05:47:40Z

In the future it might be nice to take a step back and make a decision about when tensors need to be reshaped during the qdq process.

Maybe rather than reshaping all of in compute_dynamic_scales_and_zp, _process_quantization, and dequantize, we can have this logic just exist once in _process_quantization.

The implementation below is what it might look like for both activations and weights. Some of this is wrong (it's late for me) but this function ensures the last dim is the granularity you want to quantize by.

def reshape_for_groups(func):
   def wrapper(x, args, ...):
       assert x.ndim >= 2

       if args.strategy == "token":
           pass
       if args.strategy == "channel":
           x = x.unsqueeze(-1)
       if args.strategy in ("group", "tensor_group"):
           num_groups = x.size(-1) // args.group_size
           x = x.unflatten(-1, (num_groups, args.group_size))
       if args.strategy == "block":
           block_height, block1_width = args.block_structure
           x = x.unfold(-2, block_height, block_height)  # [num_horiz, x.dim[-1], block_height]
           x = x.unfold(-2, block1_width, block1_width)  # [num_horiz, num_vert, block_height, block_width]
           x = flatten(-4, -3)  # [num_blocks, block_height x block_width]

       x = func(x, args, ...)

       if args.strategy == "token":
           pass
       if args.strategy == "channel":
           x = x.squeeze(-1)
       if args.strategy in ("group", "tensor_group"):
           return x.flatten(-2, -1)
       if args.strategy == "block":
           x = torch.cat(x, dim=-2)
           x = torch.cat(x, dim=-2)

   return wrapper

@reshape_for_groups
def _process_quantization(x, args, ...):
   if do_quantize:
        ...
   if do_dequantize:
        ...

kylesayrs

Thank you!

… Compression Params (#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update

@dbarbuzzi

* add utilities Signed-off-by: Kyle Sayers <[email protected]> * add tests Signed-off-by: Kyle Sayers <[email protected]> * add additional tests Signed-off-by: Kyle Sayers <[email protected]> * add utils and tests Signed-off-by: Kyle Sayers <[email protected]> * Implement transform factories Signed-off-by: Kyle Sayers <[email protected]> * add permutations Signed-off-by: Kyle Sayers <[email protected]> * add delete_offload_module Signed-off-by: Kyle Sayers <[email protected]> * key inverses by weight Signed-off-by: Kyle Sayers <[email protected]> * fix tests Signed-off-by: Kyle Sayers <[email protected]> * standardize random hadamard Signed-off-by: Kyle Sayers <[email protected]> * prepend input hooks Signed-off-by: Kyle Sayers <[email protected]> * apply sqrt division first Signed-off-by: Kyle Sayers <[email protected]> * use divided hadamards Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * add random option Signed-off-by: Kyle Sayers <[email protected]> * use random seeds, rename matrix multiply Signed-off-by: Kyle Sayers <[email protected]> * add deterministic generation to random matrix Signed-off-by: Kyle Sayers <[email protected]> * fix perm math Signed-off-by: Kyle Sayers <[email protected]> * update docstrings Signed-off-by: Kyle Sayers <[email protected]> * update docstrings Signed-off-by: Kyle Sayers <[email protected]> * cleanup Signed-off-by: Kyle Sayers <[email protected]> * cleanup 2 Signed-off-by: Kyle Sayers <[email protected]> * make seed optional Signed-off-by: Kyle Sayers <[email protected]> * remove iterable check and missing return value Signed-off-by: Kyle Sayers <[email protected]> * Remove unrelated changes * simplify code Signed-off-by: Kyle Sayers <[email protected]> * implement apply, use in tests Signed-off-by: Kyle Sayers <[email protected]> * use hadamards database file Signed-off-by: Kyle Sayers <[email protected]> * try manifest Signed-off-by: Kyle Sayers <[email protected]> * try setup, update hadamards list Signed-off-by: Kyle Sayers <[email protected]> * fix setup Signed-off-by: Kyle Sayers <[email protected]> * add docstrings, cleanup Signed-off-by: Kyle Sayers <[email protected]> * fix setup, thank you @dbarbuzzi Signed-off-by: Kyle Sayers <[email protected]> * remove numpy, add tests Signed-off-by: Kyle Sayers <[email protected]> * solidify dtype, add gpu tests Signed-off-by: Kyle Sayers <[email protected]> * fix docstring Signed-off-by: Kyle Sayers <[email protected]> * add device option Signed-off-by: Kyle Sayers <[email protected]> * construct on execution device, cache on offload device Signed-off-by: Kyle Sayers <[email protected]> * save construction device changes for later Signed-off-by: Kyle Sayers <[email protected]> * construct on execution device, cache on offload device * cite nja sloane Signed-off-by: Kyle Sayers <[email protected]> * remove dreg Signed-off-by: Kyle Sayers <[email protected]> * put on device via safe_open Signed-off-by: Kyle Sayers <[email protected]> * nits and docstrings Signed-off-by: Kyle Sayers <[email protected]> * update docstring Signed-off-by: Kyle Sayers <[email protected]> * Merge * merge with construct: construct in float32 Signed-off-by: Kyle Sayers <[email protected]> * construct with same dtype, constructing on fp32 found no difference Signed-off-by: Kyle Sayers <[email protected]> * remove unnecessary imports Signed-off-by: Kyle Sayers <[email protected]> * bugfixes (#375) Signed-off-by: Brian Dellabetta <[email protected]> * use factory_kwargs Signed-off-by: Kyle Sayers <[email protected]> * add frozen dict to deps Signed-off-by: Kyle Sayers <[email protected]> * fix style Signed-off-by: Kyle Sayers <[email protected]> * merge Signed-off-by: Kyle Sayers <[email protected]> * use delete_offload_module Signed-off-by: Kyle Sayers <[email protected]> * add docstrign Signed-off-by: Kyle Sayers <[email protected]> * use parametrize Signed-off-by: Kyle Sayers <[email protected]> * populate _dynamic_tied_weights_keys Signed-off-by: Kyle Sayers <[email protected]> * ensure serializable Signed-off-by: Kyle Sayers <[email protected]> * remove extra space Signed-off-by: Kyle Sayers <[email protected]> * apply style Signed-off-by: Kyle Sayers <[email protected]> * merge dregs * skip offloading tests until transformers changes land Signed-off-by: Kyle Sayers <[email protected]> * use set Signed-off-by: Kyle Sayers <[email protected]> * [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params (#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update * serialize Signed-off-by: Kyle Sayers <[email protected]> * fix typo, comment Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

@dbarbuzzi

* add utilities Signed-off-by: Kyle Sayers <[email protected]> * add tests Signed-off-by: Kyle Sayers <[email protected]> * add additional tests Signed-off-by: Kyle Sayers <[email protected]> * add utils and tests Signed-off-by: Kyle Sayers <[email protected]> * Implement transform factories Signed-off-by: Kyle Sayers <[email protected]> * add permutations Signed-off-by: Kyle Sayers <[email protected]> * add delete_offload_module Signed-off-by: Kyle Sayers <[email protected]> * key inverses by weight Signed-off-by: Kyle Sayers <[email protected]> * fix tests Signed-off-by: Kyle Sayers <[email protected]> * standardize random hadamard Signed-off-by: Kyle Sayers <[email protected]> * prepend input hooks Signed-off-by: Kyle Sayers <[email protected]> * apply sqrt division first Signed-off-by: Kyle Sayers <[email protected]> * use divided hadamards Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * add random option Signed-off-by: Kyle Sayers <[email protected]> * use random seeds, rename matrix multiply Signed-off-by: Kyle Sayers <[email protected]> * add deterministic generation to random matrix Signed-off-by: Kyle Sayers <[email protected]> * fix perm math Signed-off-by: Kyle Sayers <[email protected]> * update docstrings Signed-off-by: Kyle Sayers <[email protected]> * update docstrings Signed-off-by: Kyle Sayers <[email protected]> * cleanup Signed-off-by: Kyle Sayers <[email protected]> * cleanup 2 Signed-off-by: Kyle Sayers <[email protected]> * make seed optional Signed-off-by: Kyle Sayers <[email protected]> * remove iterable check and missing return value Signed-off-by: Kyle Sayers <[email protected]> * Remove unrelated changes * simplify code Signed-off-by: Kyle Sayers <[email protected]> * implement apply, use in tests Signed-off-by: Kyle Sayers <[email protected]> * use hadamards database file Signed-off-by: Kyle Sayers <[email protected]> * try manifest Signed-off-by: Kyle Sayers <[email protected]> * try setup, update hadamards list Signed-off-by: Kyle Sayers <[email protected]> * fix setup Signed-off-by: Kyle Sayers <[email protected]> * add docstrings, cleanup Signed-off-by: Kyle Sayers <[email protected]> * fix setup, thank you @dbarbuzzi Signed-off-by: Kyle Sayers <[email protected]> * remove numpy, add tests Signed-off-by: Kyle Sayers <[email protected]> * solidify dtype, add gpu tests Signed-off-by: Kyle Sayers <[email protected]> * fix docstring Signed-off-by: Kyle Sayers <[email protected]> * add device option Signed-off-by: Kyle Sayers <[email protected]> * construct on execution device, cache on offload device Signed-off-by: Kyle Sayers <[email protected]> * save construction device changes for later Signed-off-by: Kyle Sayers <[email protected]> * construct on execution device, cache on offload device * cite nja sloane Signed-off-by: Kyle Sayers <[email protected]> * remove dreg Signed-off-by: Kyle Sayers <[email protected]> * put on device via safe_open Signed-off-by: Kyle Sayers <[email protected]> * nits and docstrings Signed-off-by: Kyle Sayers <[email protected]> * update docstring Signed-off-by: Kyle Sayers <[email protected]> * Merge * merge with construct: construct in float32 Signed-off-by: Kyle Sayers <[email protected]> * construct with same dtype, constructing on fp32 found no difference Signed-off-by: Kyle Sayers <[email protected]> * remove unnecessary imports Signed-off-by: Kyle Sayers <[email protected]> * bugfixes (#375) Signed-off-by: Brian Dellabetta <[email protected]> * use factory_kwargs Signed-off-by: Kyle Sayers <[email protected]> * add frozen dict to deps Signed-off-by: Kyle Sayers <[email protected]> * fix style Signed-off-by: Kyle Sayers <[email protected]> * merge Signed-off-by: Kyle Sayers <[email protected]> * use delete_offload_module Signed-off-by: Kyle Sayers <[email protected]> * add docstrign Signed-off-by: Kyle Sayers <[email protected]> * use parametrize Signed-off-by: Kyle Sayers <[email protected]> * populate _dynamic_tied_weights_keys Signed-off-by: Kyle Sayers <[email protected]> * ensure serializable Signed-off-by: Kyle Sayers <[email protected]> * remove extra space Signed-off-by: Kyle Sayers <[email protected]> * apply style Signed-off-by: Kyle Sayers <[email protected]> * merge dregs * skip offloading tests until transformers changes land Signed-off-by: Kyle Sayers <[email protected]> * use set Signed-off-by: Kyle Sayers <[email protected]> * [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params (#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update * serialize Signed-off-by: Kyle Sayers <[email protected]> * fix typo, comment Signed-off-by: Kyle Sayers <[email protected]> * include format Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]>

dsikka mentioned this pull request Jul 29, 2025

[NVFP4] Add lm-eval test case vllm-project/llm-compressor#1689

Merged

dsikka added 3 commits July 30, 2025 14:08

add compression param; update qdq for batch greater than 1

be89690

make generic

b29792f

fix tests

30ad305

dsikka force-pushed the fix_nvfp4_decomp branch from dc36cfa to 30ad305 Compare July 30, 2025 14:13

dsikka marked this pull request as ready for review July 30, 2025 15:00

kylesayrs previously approved these changes Jul 30, 2025

View reviewed changes

brian-dellabetta reviewed Jul 30, 2025

View reviewed changes

src/compressed_tensors/quantization/utils/helpers.py Outdated Show resolved Hide resolved

remove incorrect line change; make generic

3548dc5

dsikka dismissed kylesayrs’s stale review via 3548dc5 July 30, 2025 16:02

dsikka requested review from brian-dellabetta and kylesayrs July 30, 2025 16:03

brian-dellabetta previously approved these changes Jul 31, 2025

View reviewed changes

rahul-tuli previously approved these changes Jul 31, 2025

View reviewed changes

src/compressed_tensors/quantization/lifecycle/forward.py Outdated Show resolved Hide resolved

kylesayrs reviewed Jul 31, 2025

View reviewed changes

src/compressed_tensors/quantization/utils/helpers.py Outdated Show resolved Hide resolved

kylesayrs requested changes Jul 31, 2025

View reviewed changes

update

1cfd8bb

dsikka dismissed stale reviews from rahul-tuli and brian-dellabetta via 1cfd8bb July 31, 2025 17:01

dsikka requested review from kylesayrs, brian-dellabetta and rahul-tuli July 31, 2025 17:04

kylesayrs approved these changes Jul 31, 2025

View reviewed changes

brian-dellabetta approved these changes Jul 31, 2025

View reviewed changes

dsikka merged commit 46d84d8 into main Jul 31, 2025
1 check passed

dsikka deleted the fix_nvfp4_decomp branch July 31, 2025 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params #407

[Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params #407

Uh oh!

dsikka commented Jul 28, 2025 •

edited

Loading

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

brian-dellabetta left a comment •

edited

Loading

Uh oh!

rahul-tuli left a comment

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment

Uh oh!

kylesayrs commented Jul 31, 2025 •

edited

Loading

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

Uh oh!

[Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params #407

[Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params #407

Uh oh!

Conversation

dsikka commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brian-dellabetta left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahul-tuli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dsikka commented Jul 28, 2025 •

edited

Loading

brian-dellabetta left a comment •

edited

Loading

kylesayrs commented Jul 31, 2025 •

edited

Loading