[BUG] FP8 real_quantization doesnt work with block_sizes

## Describe the bug

Following is the part of code that is causing issue [here](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/quantization/qtensor/fp8_tensor.py#L69-L75)

The issue occurs because the amax that is set during the calibration step doesnt take into consideration block_sizes [here](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/quantization/calib/max.py#L53)

And when we try to compress it the previously calculated amax is passed as scales [here](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/quantization/nn/modules/tensor_quantizer.py#L567)

This results into following error
```
/usr/local/lib/python3.11/dist-packages/modelopt/torch/quantization/qtensor/fp8_tensor.py in quantize(cls, input, scales, axis, block_sizes)
     99                 expanded_scales = expanded_scales.reshape(expected_shape)
    100 
--> 101             assert scales.shape == tuple(expected_shape), (
    102                 f"Mismatch in expected scale shape: {scales.shape} vs {tuple(expected_shape)}"
    103             )

AssertionError: Mismatch in expected scale shape: torch.Size([]) vs (1152, 18)
```

## System information

nvidia_modelopt - 0.29.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] FP8 real_quantization doesnt work with block_sizes #193

Describe the bug

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] FP8 real_quantization doesnt work with block_sizes #193

Description

Describe the bug

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions