You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Determine the quantization statistics for input matrix `A` in accordance to the `LLM.int8()` algorithm.
2518
-
2519
-
The statistics are determined both row-wise and column-wise (transposed).
2520
-
2521
-
For more information, see the [LLM.int8() paper](https://arxiv.org/abs/2208.07339).
2522
-
2523
-
<Tip warning={true}>
2524
-
This function exists for backwards compatibility only. It is advised to use [`int8_double_quant`] instead.
2525
-
The difference is that this function will return a [`COOSparseTensor`] for outliers instead of a column index.
2526
-
</Tip>
2527
-
2528
-
Args:
2529
-
A (`torch.Tensor` with dtype `torch.float16`): The input matrix.
2530
-
col_stats (`torch.Tensor`, *optional*): A pre-allocated tensor to hold the column-wise quantization scales.
2531
-
row_stats (`torch.Tensor`, *optional*): A pre-allocated tensor to hold the row-wise quantization scales.
2532
-
out_col (`torch.Tensor`, *optional*): A pre-allocated tensor to hold the column-wise quantized data.
2533
-
out_row (`torch.Tensor`, *optional*): A pre-allocated tensor to hold the row-wise quantized data.
2534
-
threshold (`float`, *optional*):
2535
-
An optional threshold for sparse decomposition of outlier features.
2536
-
2537
-
No outliers are held back when 0.0. Defaults to 0.0.
2538
-
2539
-
Returns:
2540
-
`Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, Optional[torch.Tensor]]`: A tuple containing the quantized tensor and relevant statistics.
2541
-
- `torch.Tensor` with dtype `torch.int8`: The row-wise quantized data.
2542
-
- `torch.Tensor` with dtype `torch.int8`: The column-wise quantized data.
2543
-
- `torch.Tensor` with dtype `torch.float32`: The row-wise quantization scales.
2544
-
- `torch.Tensor` with dtype `torch.float32`: The column-wise quantization scales.
2545
-
- `COOSparseTensor`, *optional*: A structure representing the outlier values from the input tensor.
0 commit comments