Skip to content

[BUG] FP8 real_quantization doesnt work with block_sizes #193

@ishan-modi

Description

@ishan-modi

Describe the bug

Following is the part of code that is causing issue here

The issue occurs because the amax that is set during the calibration step doesnt take into consideration block_sizes here

And when we try to compress it the previously calculated amax is passed as scales here

This results into following error

/usr/local/lib/python3.11/dist-packages/modelopt/torch/quantization/qtensor/fp8_tensor.py in quantize(cls, input, scales, axis, block_sizes)
     99                 expanded_scales = expanded_scales.reshape(expected_shape)
    100 
--> 101             assert scales.shape == tuple(expected_shape), (
    102                 f"Mismatch in expected scale shape: {scales.shape} vs {tuple(expected_shape)}"
    103             )

AssertionError: Mismatch in expected scale shape: torch.Size([]) vs (1152, 18)

System information

nvidia_modelopt - 0.29.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions