-
Notifications
You must be signed in to change notification settings - Fork 76
Support quantizing native FP8 models #536
Copy link
Copy link
Closed
Description
New models are coming in native FP8 form, for example Minimax-M2.1
However trying to quantize them is a whack-in-a-mole of unsupported torch features in both compressed-tensors and llm-compressor.
In llm-compressor I'm hit by silent integer promotion:
File "[...]/.venv/lib/python3.12/site-packages/compressed_tensors/quantization/lifecycle/forward.py", line 471, in _quantize
scaled = x / scale
~~^~~~~~~
RuntimeError: Promotion for Float8 Types is not supported, attempted to promote Float8_e4m3fn and Float
compressed-tensors/src/compressed_tensors/quantization/lifecycle/forward.py
Lines 455 to 471 in 797d301
| def _quantize( | |
| x: torch.Tensor, | |
| scale: torch.Tensor, | |
| zero_point: torch.Tensor, | |
| q_min: torch.Tensor, | |
| q_max: torch.Tensor, | |
| args: QuantizationArgs, | |
| dtype: Optional[torch.dtype] = None, | |
| global_scale: Optional[torch.Tensor] = None, | |
| ) -> torch.Tensor: | |
| # if a global scale is optionally provided, use it | |
| # to further scale the local `scale` parameter | |
| if global_scale is not None: | |
| scale = scale / global_scale | |
| scaled = x / scale |
Unimplemented min/max/abs
File "[...]/.venv/lib/python3.12/site-packages/compressed_tensors/quantization/utils/helpers.py", line 432, in generate_gparam
min_vals = torch.min(updated_min_val, torch.zeros_like(updated_min_val))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: "min_elementwise_cuda" not implemented for 'Float8_e4m3fn'
File "[...]/.venv/lib/python3.12/site-packages/compressed_tensors/quantization/utils/helpers.py", line 95, in calculate_qparams
max_val_pos = torch.max(torch.abs(min_vals), torch.abs(max_vals))
^^^^^^^^^^^^^^^^^^^
NotImplementedError: "abs_cuda" not implemented for 'Float8_e4m3fn'
Downstream issue: vllm-project/llm-compressor#2172 (comment)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels