-
Notifications
You must be signed in to change notification settings - Fork 322
Add IntxUnpackedTensor #2732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IntxUnpackedTensor #2732
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2732
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f6c9d09 with merge base e6b38bb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
block_size: the block size for quantization, representing the granularity, for example groupwise quantization will have block_size (1, group_size) | ||
""" | ||
|
||
tensor_data_attrs = ["int_data", "scale", "zero_point"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw if you update these to tensor_data_names
and tensor_attribute_names
you'll be able to remove some of the implementations, see docs in https://github.com/pytorch/ao/pull/2710/files#diff-d2a11602a79e83305208472f1abe6a4106f02ce62a7f9524007181813863fcf6R687, example: #2738
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can still override the behavior in TorchAOBaseTensor, right?
For example, it looks like aten._to_copy.default gets auto-populated, but I want to define its dtype variant in addition to device variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be working, I haven't actively tested this behavior though, I'll try to add a test for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
) | ||
|
||
@classmethod | ||
def from_float( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we are standardizing on from_hp
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does hp stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high precision
torchao/quantization/quant_api.py
Outdated
@@ -2060,6 +2061,8 @@ class IntxWeightOnlyConfig(AOBaseConfig): | |||
mapping_type: MappingType = MappingType.SYMMETRIC | |||
scale_dtype: Optional[torch.dtype] = None | |||
layout: Layout = QDQLayout() | |||
packing_format: PackingFormat = PackingFormat.UNPACKED | |||
VERSION: int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we updated the name to version
Any more concerns here @jerryzh168? |
This format is inteded for torch.export use cases. | ||
|
||
Tensor Attributes: | ||
int_data: int data for quantization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use qdata
to align with other tensors
block_size=block_size, | ||
) | ||
|
||
def get_plain(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no longer need this I think
@classmethod | ||
def from_hp( | ||
cls, | ||
float_tensor: torch.Tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use hp_tensor
to align with the method name
cls, | ||
float_tensor: torch.Tensor, | ||
block_size: Tuple[int], | ||
dtype: torch.dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to target_dtype
for more clarity
|
||
class IntxUnpackedTensor(TorchAOBaseTensor): | ||
""" | ||
intx quantization with unpacked format. Subbyte quantized data is represented as int8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to make it clearer, we can add a bit more description here about subbyte quantized data I think, we should mention the range of the quantized values are restricted to the quant_min and quant_max of the target bit width, e.g. for uint4, the values falls into range of 0 and 15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
block_size: Optional[Tuple[int]] = None, | ||
): | ||
# Check plain data and infer block_size from shapes | ||
if block_size is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be easier just to make block_size required? when is block_size None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. I did use it in the slice implementation, but I just added logic inside slice to recompute the block size.
self.bit_width = bit_width | ||
self.block_size = block_size | ||
|
||
def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repr is also implemented by default in TorchAOBaseTensor when you define tensor_data_names
and tensor_attribute_names
btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
device = kwargs.pop("device") | ||
dtype = kwargs.pop("dtype") | ||
assert dtype in _FLOAT_TYPES | ||
return self.__class__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: self.__class__
--> IntxUnpackedTensor
to reduce runtime check and align with other code
scale = aten.slice.Tensor(self.scale, dim, start_scale, end_scale, step) | ||
zero_point = aten.slice.Tensor(self.zero_point, dim, start_scale, end_scale, step) | ||
|
||
new = self.__class__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
93948a4
to
143fe91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, please add a bit details in PR summary to explain the context for the change and Test Plan as well.
* add intx unpacked tensor * up * up * up * up * up
This adds IntxUnpackedTensor, where subbyte quantized data is represented as int8. The range of the quantized values are restricted to the quant_min and quant_max of the target_dtype, e.g., if target_dtype=torch.int4, qdata will be an int8 tensor with values in [-8, 7]. Quantization is represented in a decomposed way.
This tensor is intended for export use cases that currently use AQT with QDQLayout.
The test plan are the new unit tests.