-
Notifications
You must be signed in to change notification settings - Fork 325
Description
Update: Our team will evaluate this more before outsourcing the migration to more people in the community
Context:
Previously we use AffineQuantizedTensor for many of our use cases including int4, float8, intx, floatx. It introduces some complicated abstractions like Layout, people have been saying it's a bit hard to understand, and there are many indirections in the code.
As an effort simplify the code base and make it easier to contribute to, we have been adding new features with a different structure in mind. Now we want to structure Tensors by "dtype" and "packing_format", e.g. we'll have Int4PreshuffledTensor, Int8Tensor, Float8Tensor, instead of having AffineQuantizedTensor and multiple layouts.
Please check out our updated docs for the new tensor subclass organization structure and guide for design:
- quantization overview: https://docs-preview.pytorch.org/pytorch/ao/2723/quantization_overview.html
- contributor guide: https://docs-preview.pytorch.org/pytorch/ao/2723/contributor_guide.html
- Examples of tensor subclasses following new design: https://github.com/pytorch/ao/tree/main/torchao/quantization/quantize_/workflows
List of things to migrate:
INT8
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/block_sparse_layout.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/plain_layout.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/semi_sparse_layout.py
INT4 weight only
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_cpu_layout.py @Xia-Weiwen https://github.com/pytorch/ao/blob/main/torchao/quantization/quantize_/workflows/int4/int4_opaque_tensor.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_xpu_layout.py @liangan1 Add Int4PlainInt32Tensor #2845
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_sparse_layout.py @liangel-02 Int4 sparse marlin tensor #2771
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/tensor_core_tiled_layout.py @jerryzh168 Add Int4TilePackedTo4dTensor #2791
- HQQ support for tensor core tiled layout @jerryzh168 Add hqq support for Int4TilePackedTo4dTensor #2912
INT4 weight + int8 activation
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/cutlass_int4_packed_layout.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/dyn_int8_act_int4_wei_cpu_layout.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_qqq_tensor.py
INTx Weight Only
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/gemlite_layout.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/uintx_layout.py
Int8DynamicActivationIntxWeightConfig
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/packed_linear_int8_dynamic_activation_intx_weight_layout.py @metascroy Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT #2742
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/q_dq_layout.py @metascroy Add IntxUnpackedTensor #2732
FP8