Skip to content

Conversation

metascroy
Copy link
Contributor

@metascroy metascroy commented Aug 11, 2025

This adds IntxUnpackedTensor, where subbyte quantized data is represented as int8. The range of the quantized values are restricted to the quant_min and quant_max of the target_dtype, e.g., if target_dtype=torch.int4, qdata will be an int8 tensor with values in [-8, 7]. Quantization is represented in a decomposed way.

This tensor is intended for export use cases that currently use AQT with QDQLayout.

The test plan are the new unit tests.

Copy link

pytorch-bot bot commented Aug 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2732

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f6c9d09 with merge base e6b38bb (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@metascroy metascroy requested a review from jerryzh168 August 11, 2025 16:57
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 11, 2025
block_size: the block size for quantization, representing the granularity, for example groupwise quantization will have block_size (1, group_size)
"""

tensor_data_attrs = ["int_data", "scale", "zero_point"]
Copy link
Contributor

@jerryzh168 jerryzh168 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw if you update these to tensor_data_names and tensor_attribute_names you'll be able to remove some of the implementations, see docs in https://github.com/pytorch/ao/pull/2710/files#diff-d2a11602a79e83305208472f1abe6a4106f02ce62a7f9524007181813863fcf6R687, example: #2738

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can still override the behavior in TorchAOBaseTensor, right?

For example, it looks like aten._to_copy.default gets auto-populated, but I want to define its dtype variant in addition to device variant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be working, I haven't actively tested this behavior though, I'll try to add a test for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

)

@classmethod
def from_float(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we are standardizing on from_hp now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does hp stand for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high precision

@@ -2060,6 +2061,8 @@ class IntxWeightOnlyConfig(AOBaseConfig):
mapping_type: MappingType = MappingType.SYMMETRIC
scale_dtype: Optional[torch.dtype] = None
layout: Layout = QDQLayout()
packing_format: PackingFormat = PackingFormat.UNPACKED
VERSION: int = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we updated the name to version

@metascroy metascroy added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 12, 2025
@metascroy
Copy link
Contributor Author

Any more concerns here @jerryzh168?

This format is inteded for torch.export use cases.

Tensor Attributes:
int_data: int data for quantization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use qdata to align with other tensors

block_size=block_size,
)

def get_plain(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no longer need this I think

@classmethod
def from_hp(
cls,
float_tensor: torch.Tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use hp_tensor to align with the method name

cls,
float_tensor: torch.Tensor,
block_size: Tuple[int],
dtype: torch.dtype,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to target_dtype for more clarity


class IntxUnpackedTensor(TorchAOBaseTensor):
"""
intx quantization with unpacked format. Subbyte quantized data is represented as int8.
Copy link
Contributor

@jerryzh168 jerryzh168 Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to make it clearer, we can add a bit more description here about subbyte quantized data I think, we should mention the range of the quantized values are restricted to the quant_min and quant_max of the target bit width, e.g. for uint4, the values falls into range of 0 and 15

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

block_size: Optional[Tuple[int]] = None,
):
# Check plain data and infer block_size from shapes
if block_size is None:
Copy link
Contributor

@jerryzh168 jerryzh168 Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be easier just to make block_size required? when is block_size None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. I did use it in the slice implementation, but I just added logic inside slice to recompute the block size.

self.bit_width = bit_width
self.block_size = block_size

def __repr__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repr is also implemented by default in TorchAOBaseTensor when you define tensor_data_names and tensor_attribute_names btw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

device = kwargs.pop("device")
dtype = kwargs.pop("dtype")
assert dtype in _FLOAT_TYPES
return self.__class__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: self.__class__ --> IntxUnpackedTensor to reduce runtime check and align with other code

scale = aten.slice.Tensor(self.scale, dim, start_scale, end_scale, step)
zero_point = aten.slice.Tensor(self.zero_point, dim, start_scale, end_scale, step)

new = self.__class__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, please add a bit details in PR summary to explain the context for the change and Test Plan as well.

@metascroy metascroy merged commit 72b35bf into main Aug 19, 2025
18 checks passed
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
* add intx unpacked tensor

* up

* up

* up

* up

* up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants