Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses

Update: Our team will evaluate this more before outsourcing the migration to more people in the community

Context:
Previously we use AffineQuantizedTensor for many of our use cases including int4, float8, intx, floatx. It introduces some complicated abstractions like Layout, people have been saying it's a bit hard to understand, and there are many indirections in the code.

As an effort simplify the code base and make it easier to contribute to, we have been adding new features with a different structure in mind. Now we want to structure Tensors by "dtype" and "packing_format", e.g. we'll have Int4PreshuffledTensor, Int8Tensor, Float8Tensor, instead of having AffineQuantizedTensor and multiple layouts.

Please check out our updated docs for the new tensor subclass organization structure and guide for design:
* quantization overview: https://docs-preview.pytorch.org/pytorch/ao/2723/quantization_overview.html
* contributor guide: https://docs-preview.pytorch.org/pytorch/ao/2723/contributor_guide.html
* Examples of tensor subclasses following new design: https://github.com/pytorch/ao/tree/main/torchao/quantization/quantize_/workflows

List of things to migrate:
INT8
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/block_sparse_layout.py
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/plain_layout.py
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/semi_sparse_layout.py


INT4 weight only
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_cpu_layout.py @Xia-Weiwen  https://github.com/pytorch/ao/blob/main/torchao/quantization/quantize_/workflows/int4/int4_opaque_tensor.py
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_xpu_layout.py @liangan1 https://github.com/pytorch/ao/pull/2845
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_sparse_layout.py @liangel-02 https://github.com/pytorch/ao/pull/2771
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/tensor_core_tiled_layout.py @jerryzh168  https://github.com/pytorch/ao/pull/2791
* [ ] HQQ support for tensor core tiled layout @jerryzh168 https://github.com/pytorch/ao/pull/2912/

INT4 weight + int8 activation
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/cutlass_int4_packed_layout.py
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/dyn_int8_act_int4_wei_cpu_layout.py
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_qqq_tensor.py


INTx Weight Only
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/gemlite_layout.py
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/uintx_layout.py

Int8DynamicActivationIntxWeightConfig
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/packed_linear_int8_dynamic_activation_intx_weight_layout.py @metascroy https://github.com/pytorch/ao/pull/2742
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/q_dq_layout.py @metascroy https://github.com/pytorch/ao/pull/2732

FP8
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/floatx/cutlass_semi_sparse_layout.py
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/floatx/floatx_tensor_core_layout.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions